Request Queuing
Queue Your API Requests — Not Just Rate Limit Them
Rate limiting tells you when to slow down. A queue tells you when it's actually safe to proceed. When your system sends more requests than a resource can handle, you need ordered, coordinated waiting — not just rejection.
Rate Limiting vs Queuing
A rate limiter rejects or errors when you exceed the limit. A queue holds the request and activates it when capacity opens up. For API calls, you almost always want the queue — you want the work done, just not all at once. Dropping requests means retrying them, handling errors, and often degrading user experience. Queuing means the work completes, just slightly later.
How RateQueue's Queue Works
Call acquire — if capacity is available, proceed immediately. If not, wait in line. When the slot opens, activation happens automatically. The calling code just resumes.
import ratequeue.aio as rq
# High-priority user request
async with rq.acquire("openai", load=2000, priority=100, api_key=RATEQUEUE_API_KEY):
await call_openai(user_prompt)
# Lower-priority background job
async with rq.acquire("openai", load=5000, priority=10, api_key=RATEQUEUE_API_KEY):
await call_openai(batch_prompt)When the resource is near capacity, priority-100 requests skip ahead of priority-10 requests. The batch job waits. The user doesn't.
Queue Lanes for Fair Access
If one service floods the queue, it can starve others even with priority. Lanes keep traffic classes isolated — each lane gets a fair share of the queue regardless of how many requests any single caller submits.
import { ratequeue } from "@ratequeue/sdk";
// Service A requests go in lane "service-a"
await ratequeue.acquire(
"shared-api",
{ lane: "service-a", apiKey: process.env.RATEQUEUE_API_KEY! },
async () => { await callSharedApi(); }
);Real-Time Activation
No polling required. When your request's slot opens, you get notified immediately. Your worker isn't burning CPU in a loop — it's waiting efficiently. When capacity opens up, the context resumes and your API call proceeds. From your code's perspective, it's just a brief pause before execution continues.
No polling loop
Workers suspend efficiently and resume on activation. No wasted CPU.
Instant feedback
Activation happens as soon as capacity opens — not at the next poll interval.
Queue position
Know where your request stands in the queue while waiting.
Queue first, send when ready
Sign up free and replace your retry logic with a queue that actually works. One resource, no infrastructure.