Request Prioritization

Your Critical Traffic Shouldn't Wait Behind Batch Jobs

When your system is near its API rate limit, not all requests are equal. A user waiting for a response in your UI is more time-sensitive than a background enrichment job that can finish in the next minute. Priority gives you control over who gets served first.

The Classic Priority Inversion Problem

A batch job fires 100 requests and fills the queue. A user clicks a button — that request goes to the back of the queue behind 100 batch requests. The user waits 20 seconds or times out entirely. The batch job wasn't doing anything wrong, but it crowded out interactive traffic. Without priority, a first-in-first-out queue will always have this problem.

Priority in Practice

Priority is a numeric value — higher numbers are served first when capacity is limited. Requests with the same priority are served in order. The user's request at priority 100 skips ahead of the background job at priority 1.

import ratequeue.aio as rq

# User-facing request — serve this first
async with rq.acquire("openai", load=2000, priority=100, api_key=RATEQUEUE_API_KEY):
    response = await generate_user_response(prompt)

# Background enrichment job — can wait
async with rq.acquire("openai", load=5000, priority=1, api_key=RATEQUEUE_API_KEY):
    await enrich_record(record_id)

Reserved Capacity as Hard Guarantees

Priority is best-effort — it orders the queue, but if the queue is deep, even high-priority requests may wait. Reserved capacity is a hard guarantee: hold back N units of capacity exclusively for critical traffic. No batch job can consume that reserved portion, no matter how many requests it submits.

import { ratequeue } from "@ratequeue/sdk";

// Reserve 20 of your 100 TPM budget for user-facing requests
// Configure on the resource dashboard, then:
await ratequeue.acquire(
  "openai-gpt4",
  { priority: 100, lane: "user-facing", apiKey: process.env.RATEQUEUE_API_KEY! },
  async () => { await generateResponse(prompt); }
);

The reserved capacity is configured on the resource — not in code. Change the reservation on the dashboard, and it takes effect immediately.

Lanes for Traffic Isolation

Lanes and priority work together. A lane gives a traffic class its own section of the queue — batch jobs in the "batch" lane, user requests in the "user-facing" lane. Combined with reserved capacity, a flood of batch requests literally cannot reach the capacity reserved for user traffic. The two classes don't compete at all.

Priority

Soft ordering — higher values are served first when competing for the same capacity.

Reserved capacity + lanes

Hard isolation — critical traffic has dedicated headroom that other lanes cannot consume.

Give critical traffic the priority it deserves

Reserved capacity and priority ordering are available on the paid plan. Start free to validate the integration, then upgrade when you need the controls.