Multiple Limits

One Resource, Multiple Constraints

Real-world APIs rarely have just one limit. OpenAI enforces requests per minute and tokens per minute. Your database has a query rate and a connection limit. Enforcing only one means you'll still blow the other. RateQueue lets you stack multiple limits on the same resource — all enforced simultaneously.

Why Single-Dimension Limiters Break

You limit your OpenAI calls to 60 RPM. You never exceed the request rate. But your requests average 4,000 tokens each — putting you at 240k TPM against a 150k TPM limit. The rate limiter was correct about requests and useless about tokens. You still get 429s.

The same pattern appears with databases: a query rate limiter doesn't prevent connection pool exhaustion. A concurrency limiter doesn't prevent throughput bursts. You need both, enforced together.

Stack Rate + Concurrency + Load Limits Together

A realistic resource definition for an OpenAI integration: 500 requests/minute (RPM), 100,000 tokens/minute (TPM), and 20 concurrent connections — all three enforced simultaneously. A request waits if any of the three would be exceeded.

import ratequeue.aio as rq

# Limits are configured on the dashboard.
# RateQueue enforces all active limits before granting the slot.
async with rq.acquire(
    "openai-gpt4",
    load=token_estimate,   # contributes to TPM limit
    api_key=RATEQUEUE_API_KEY
):
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
# Slot released: all limit windows decremented

The load parameter feeds the token-based limit. The request automatically counts against the RPM limit. The active concurrency count is tracked by the slot itself.

Fixed vs. Sliding Windows

Both window types are available per limit — mix and match on the same resource.

Fixed window

100 requests per 60-second bucket. Resets on the clock. Simple, but allows bursts at window boundaries.

Sliding window

100 requests in any rolling 60-second period. Smoother throughput — no burst spike at window reset.

Granular Sub-Second Windows

For smoother pacing, configure limits in sub-second windows. 10 requests per second instead of 600 per minute — same throughput ceiling, but requests are spread evenly rather than bursting at the minute boundary.

Enforce all your API constraints at once

Start free with one resource and one limit. Upgrade to stack multiple constraints on the same resource.