Concurrency Control

Concurrency Limiting Across Your Entire System

Some APIs don't care about requests per minute — they cap how many can be active simultaneously. A database might handle 20 concurrent queries fine but fall over at 50. A webhook endpoint might enforce a max of 5 concurrent calls. Concurrency limits are a different constraint, and they need different enforcement.

Rate Limits vs Concurrency Limits

A rate limit controls requests over time — 10 requests per minute means you can send no more than 10 in any 60-second window. A concurrency limit controls requests at the same time — 5 concurrent means no more than 5 active simultaneously, regardless of how long each takes.

You might need both on the same resource. A database that allows 100 queries per minute but only 10 concurrent connections needs both constraints enforced together.

Why Per-Worker Concurrency Checks Don't Work

If you have 10 workers and allow 5 concurrent per worker, you're actually allowing 50 concurrent total. The concurrency limit must be enforced globally — shared across all workers — not locally per process. RateQueue enforces it centrally.

import ratequeue.aio as rq

# Max 10 concurrent requests, globally enforced across all workers
async with rq.acquire("postgres-pool", api_key=RATEQUEUE_API_KEY):
    result = await db.execute_query(sql)
    # request slot released automatically on context exit

When the context exits — whether normally or on exception — the concurrency slot is released and the next queued request is activated.

Combine Rate and Concurrency on One Resource

Define a resource with both a rate limit and a concurrency limit — both enforce simultaneously. No need to manage two separate systems or write custom logic to coordinate between them.

Rate limit

100 requests per minute. Controls throughput over time.

Concurrency limit

10 active at once. Controls load at any given moment.

Distributed Connection Pool Management

Concurrency limiting is essentially a distributed connection pool manager. When a request finishes and its slot is released, the next queued request activates. This pattern works for anything with a finite active-connection budget: database pools, external service concurrency caps, file handle limits, or any resource where "how many are in flight right now" is the binding constraint.

Unlike a per-process connection pool, this one is shared across all your workers and managed for you. No pooling library configuration, no tuning pool sizes per service instance.

Enforce concurrency globally, not per-worker

Sign up free, create a resource with a concurrency limit, and wrap the calls you want to control. Works across any number of workers.