Rate limiting
Two layers protect the API: a per-IP DDoS guard that fronts every request, and a per-organization/per-endpoint TPS guard that enforces the contract you're paying for. Both surface state via standard headers; read them and your client throttles itself.
The two-tier model
| Layer | Scope | Limit | Code on overflow |
|---|---|---|---|
| DDoS guard | Per source IP | 35 000 req/min | RATE_DDOS_EXCEEDED |
| TPS guard | Per organization × per endpoint | Endpoint-specific (e.g. ping is 20 req/min) | RATE_TPS_EXCEEDED |
The DDoS guard runs first: it's an infrastructure floor, not a customer SLA. Hitting it usually means a misconfigured aggregator IP or a clear abuse pattern; the threshold is generous enough that legitimate traffic doesn't get close. The TPS guard is the one you'll see in normal operation, and each endpoint declares its own quota in the reference.
Both guards count requests across all endpoints of the same organization for the IP tier, and across one endpoint for the TPS tier. Rotating endpoints does not multiply the IP budget; rotating IPs does not multiply the TPS budget.
Sliding window, not fixed buckets
Both guards use a 2-bucket weighted sliding window (the technique Cloudflare and Stripe use). The intuition:
- The current 60-second bucket counts at full weight.
- The previous 60-second bucket counts at a weight that decreases linearly from 1.0 (start of current bucket) to 0.0 (end of current bucket).
- Effective count =
current + previous × weight.
With a fixed bucket, a client could send limit requests at second 59 and limit more at second 0 of the next bucket: 2× the limit in a 2-second window. The sliding window catches that.
Response headers
v6 follows RFC 9598 (RateLimit Header Fields for HTTP). Every TPS-guarded response carries:
RateLimit-Limit: 20
RateLimit-Remaining: 18
RateLimit-Reset: 31
RateLimit-Policy: 20;w=60;name="endpoint"
| Header | Meaning |
|---|---|
RateLimit-Limit | Quota for the active window. |
RateLimit-Remaining | Requests left in the active window. |
RateLimit-Reset | Seconds until the active window rolls over. |
RateLimit-Policy | Declarative quota in a parseable form. 20;w=60 = 20 requests in a 60-second sliding window. |
On a 429, you also get Retry-After in seconds; wait at least that long before retrying.
Recommended client strategy
Three rules cover most cases:
-
Preemptive throttling.
Read
RateLimit-Remainingon every response. When it drops below 20% ofRateLimit-Limit, slow your request rate so you reach the next reset with room to spare. -
Honor
Retry-After. When you get a 429, sleep at leastRetry-Afterseconds before the next attempt. Do not retry immediately, or you'll just spike the counter again. -
Exponential backoff with jitter.
If retries keep failing, add an exponential delay (e.g.
2^n × 100ms) with random jitter (e.g. ±50%). Jitter prevents thousands of clients from synchronizing on the same retry second.
Reference implementation (Python)
import time, random, httpx
def call_with_backoff(client, request, max_retries=5):
for attempt in range(max_retries):
r = client.send(request)
if r.status_code != 429:
return r
wait = int(r.headers.get("Retry-After", "1"))
# exponential backoff capped at Retry-After * 4, plus jitter
delay = min(wait * (2 ** attempt), wait * 4)
delay += random.uniform(0, delay * 0.5)
time.sleep(delay)
return r # caller decides what to do after exhaustion
SDKs auto-throttle
Official SDKs read RateLimit-Policy at runtime and pace requests internally so the application code doesn't have to. If you're writing your own client, parsing the policy is straightforward:
// RateLimit-Policy: 20;w=60;name="endpoint"
// → limit 20, window 60s, named "endpoint"
What you see when a guard fires
Both guards return HTTP 429 with the standard envelope. The error.code tells you which one:
RateLimit-Limit: 20 RateLimit-Remaining: 0 RateLimit-Reset: 42 Retry-After: 42 { "success": false, "error": { "code": "RATE_TPS_EXCEEDED", "message": "You have exceeded the allowed request rate for this endpoint." } }
Asking for a higher limit
If your traffic pattern legitimately exceeds the per-endpoint TPS (bulk imports, marketing campaigns, batch onboarding), talk to your organization manager. We can raise the per-organization limit for a specific endpoint without touching the DDoS floor. Bring:
- Expected peak QPS and total daily volume.
- Whether the spike is one-off (a launch) or sustained.
- How you back off when limited (so we know the limit increase won't just shift the problem).