Globally Distributed
Rate Limiting That Keeps Up With Your Global Infrastructure
Rate limiting adds a coordination step before every API call. If that coordination step involves a round trip to a control plane on the other side of the world, the latency tax adds up. RateQueue lets you run resources close to your workloads — so the coordination is fast, and your infrastructure can deploy anywhere.
The Latency Cost of Centralized Coordination
A rate limiter sitting in us-east-1 adds ~150ms to every API call from eu-west-1. For a background job, that's tolerable. For user-facing inference at scale, that's a real cost — compounded across every request your system makes. And if your workloads are spread across regions, a single-region control plane is both a latency problem and a reliability risk.
Run Resources Where Your Workloads Are
RateQueue lets you choose where each resource is hosted. A resource serving European workloads runs in Europe. A resource for US-east jobs runs in US-east. Sub-millisecond coordination for workloads in the same region.
import ratequeue.aio as rq
# Resource "openai-eu" is hosted in eu-west-1.
# Workers in Europe get <1ms coordination latency.
async with rq.acquire("openai-eu", load=2000, api_key=RATEQUEUE_API_KEY):
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=messages
)Resource naming and placement is entirely in your hands. One resource per region, or one global resource — your call.
High Availability by Design
RateQueue is managed infrastructure built for production traffic. No single point of failure, no maintenance windows, no Redis cluster to provision and babysit. The control plane is available globally with redundancy at the infrastructure level.
No Redis to manage
Fully managed infrastructure. No cluster to provision, scale, or recover when it goes down.
Instant updates
Resource limit changes propagate immediately — no restarts or deploys needed across your fleet.
Built-in monitoring
Queue depth, throughput, and utilization visible in the dashboard without building your own tooling.
For Multi-Region Architectures
If your system is already multi-region, your rate limiting should be too. Two approaches depending on your quota model:
One resource per region
Each region has its own limit. Workers coordinate locally with minimal latency. Use when your API provider gives per-region quotas.
One global resource
All regions share a single limit and coordinate against one resource. Best for lower-throughput use cases with a global quota.
Deploy globally without the coordination overhead
Start free — upgrade for global availability options and regional resource placement.