Request Queuing

Queue Your API Requests — Not Just Rate Limit Them

Rate limiting tells you when to slow down. A queue tells you when it's actually safe to proceed. When your system sends more requests than a resource can handle, you need ordered, coordinated waiting — not just rejection.

Rate Limiting vs Queuing

A rate limiter rejects or errors when you exceed the limit. A queue holds the request and activates it when capacity opens up. For API calls, you almost always want the queue — you want the work done, just not all at once. Dropping requests means retrying them, handling errors, and often degrading user experience. Queuing means the work completes, just slightly later.

How RateQueue's Queue Works

Call acquire — if capacity is available, proceed immediately. If not, wait in line. When the slot opens, activation happens automatically. The calling code just resumes.

import ratequeue.aio as rq

# High-priority user request
async with rq.acquire("openai", load=2000, priority=100, api_key=RATEQUEUE_API_KEY):
    await call_openai(user_prompt)

# Lower-priority background job
async with rq.acquire("openai", load=5000, priority=10, api_key=RATEQUEUE_API_KEY):
    await call_openai(batch_prompt)

When the resource is near capacity, priority-100 requests skip ahead of priority-10 requests. The batch job waits. The user doesn't.

Queue Lanes for Fair Access

If one service floods the queue, it can starve others even with priority. Lanes keep traffic classes isolated — each lane gets a fair share of the queue regardless of how many requests any single caller submits.

import { ratequeue } from "@ratequeue/sdk";

// Service A requests go in lane "service-a"
await ratequeue.acquire(
  "shared-api",
  { lane: "service-a", apiKey: process.env.RATEQUEUE_API_KEY! },
  async () => { await callSharedApi(); }
);

Real-Time Activation

No polling required. When your request's slot opens, you get notified immediately. Your worker isn't burning CPU in a loop — it's waiting efficiently. When capacity opens up, the context resumes and your API call proceeds. From your code's perspective, it's just a brief pause before execution continues.

No polling loop

Workers suspend efficiently and resume on activation. No wasted CPU.

Instant feedback

Activation happens as soon as capacity opens — not at the next poll interval.

Queue position

Know where your request stands in the queue while waiting.

Queue first, send when ready

Start Free View Docs