Never hit 429s again

RateQueue is a fully managed API for queueing and rate limiting your outbound work across distributed systems, with limits, priorities, reserved capacity, and real-time feedback.

Start Free View Docs

1import ratequeue.aio as rq

3async with rq.acquire(

4 "claude-sonnet-4-6", load=20_000,

5 priority=10, api_key=RATEQUEUE_API_KEY

6):

7 await llm_call()

Use Cases

API rate limiting, LLM quota management, traffic throttling, distributed locking, request prioritization, and reserved capacity for shared outbound work.

API Rate Limiting

Control outbound calls to third-party APIs from all of your workers and services in one place, instead of letting each caller guess when capacity is available.

Third-party APIsShared quota429 prevention

LLM APIs

Coordinate request, token, and concurrency budgets for shared LLM providers without routing traffic through an external gateway or rewriting your existing integration.

Token limitsModel quotasMultiple workers

Traffic Throttling

Spread bursty traffic over time so downstream systems receive predictable throughput instead of spikes, retries, and noisy failure cascades.

Burst smoothingSub-second windowsPredictable throughput

Distributed Locking

Treat a capacity-1 resource as a managed lock so only one actor can work on a shared dependency at a time, with queueing instead of race conditions.

Capacity 1Shared dependencyRace condition control

Request Prioritization

Serve urgent traffic before background work when all requests cannot be treated equally, so interactive or revenue-critical flows stay ahead of batch jobs.

Critical trafficBackground jobsPriority order

Reserved Capacity

Hold part of a resource back for specific traffic classes so critical work is always protected even when lower-priority traffic is busy consuming the general pool.

Protected headroomTraffic classesGuaranteed access

Features

Distributed queueing, outbound API rate limiting, concurrency control, prioritization, SDK integration, and real-time resource management in one managed control plane.

Queueing

Requests wait for safe execution instead of racing each other or failing into retries.

Ordered access to shared capacity
Central coordination across workers
No retry storms when the resource is full

Rate Limits

Use fixed or sliding windows to enforce exactly how fast a resource can be consumed.

Concurrency Limits

Cap how many requests can be active at once across your whole distributed system.

Multiple Limits Per Resource

Combine several rules on the same resource instead of forcing a single simplistic limit.

Request vs Load Scope

Count each request equally or use real request load like tokens, payload size, or work units.

Prioritization

Give critical traffic faster access when not all work has the same importance.

Reserved Capacity

Protect part of a resource for important traffic classes so they are not crowded out.

Lanes

Separate traffic classes so one service or workload cannot clog the entire queue.

Allocation Strategies

Choose strict allocation or strict-per-lane allocation depending on how you want fairness enforced.

Real-Time Feedback

Poll for state or switch to real-time activation feedback when you need immediate updates.

Globally Distributed

Run resources close to the workloads that need them and choose where capacity should live.

High Availability

Built as managed infrastructure so the control plane remains reliable under real production traffic.

Instant Resource Updates

Update limits, capacity, reservations, and routing behavior without redeploying application logic.

SDKs

Use SDKs when you want the cleanest integration, or call the raw API directly when you need full control.

UI For Management And Monitoring

Create resources, inspect queue state, and monitor active requests from a dedicated dashboard.

Sub-Second Windows

Throttle bursty traffic with granular windows when you need smoother request pacing than per-minute limits allow.

Pricing

Free and paid pricing for managed queueing, outbound API rate limiting, concurrency control, and protected capacity across shared resources.

Free

$0

For getting started with one real resource and a simple production flow.

1 resource
1 limit per resource
Rate + concurrency limits
Fixed + sliding windows
Request priority
Polling-based request status

Paid

$6/month

For teams coordinating more resources, more limits, and more important traffic classes.

10 resources
3 limits per resource
Reserved capacity
Lanes
Real-time feedback
Global availability options

Start for free

Create a resource, get an API key, and start controlling shared outbound traffic without building your own coordination layer.

Open dashboard