Rate limiting

Two layers protect the API: a per-IP DDoS guard that fronts every request, and a per-organization/per-endpoint TPS guard that enforces the contract you're paying for. Both surface state via standard headers; read them and your client throttles itself.

The two-tier model

Layer	Scope	Limit	Code on overflow
DDoS guard	Per source IP	35 000 req/min	`RATE_DDOS_EXCEEDED`
TPS guard	Per organization × per endpoint	Endpoint-specific (e.g. ping is 20 req/min)	`RATE_TPS_EXCEEDED`

The DDoS guard runs first: it's an infrastructure floor, not a customer SLA. Hitting it usually means a misconfigured aggregator IP or a clear abuse pattern; the threshold is generous enough that legitimate traffic doesn't get close. The TPS guard is the one you'll see in normal operation, and each endpoint declares its own quota in the reference.

Both guards count requests across all endpoints of the same organization for the IP tier, and across one endpoint for the TPS tier. Rotating endpoints does not multiply the IP budget; rotating IPs does not multiply the TPS budget.

Sliding window, not fixed buckets

Both guards use a 2-bucket weighted sliding window (the technique Cloudflare and Stripe use). The intuition:

The current 60-second bucket counts at full weight.
The previous 60-second bucket counts at a weight that decreases linearly from 1.0 (start of current bucket) to 0.0 (end of current bucket).
Effective count = current + previous × weight.

With a fixed bucket, a client could send limit requests at second 59 and limit more at second 0 of the next bucket: 2× the limit in a 2-second window. The sliding window catches that.

Response headers

v6 follows RFC 9598 (RateLimit Header Fields for HTTP). Every TPS-guarded response carries:

RateLimit-Limit:     20
RateLimit-Remaining: 18
RateLimit-Reset:     31
RateLimit-Policy:    20;w=60;name="endpoint"

Header	Meaning
`RateLimit-Limit`	Quota for the active window.
`RateLimit-Remaining`	Requests left in the active window.
`RateLimit-Reset`	Seconds until the active window rolls over.
`RateLimit-Policy`	Declarative quota in a parseable form. `20;w=60` = 20 requests in a 60-second sliding window.

On a 429, you also get Retry-After in seconds; wait at least that long before retrying.

Recommended client strategy

Three rules cover most cases:

Preemptive throttling. Read RateLimit-Remaining on every response. When it drops below 20% of RateLimit-Limit, slow your request rate so you reach the next reset with room to spare.
Honor Retry-After. When you get a 429, sleep at least Retry-After seconds before the next attempt. Do not retry immediately, or you'll just spike the counter again.
Exponential backoff with jitter. If retries keep failing, add an exponential delay (e.g. 2^n × 100ms) with random jitter (e.g. ±50%). Jitter prevents thousands of clients from synchronizing on the same retry second.

Reference implementation (Python)

import time, random, httpx

def call_with_backoff(client, request, max_retries=5):
    for attempt in range(max_retries):
        r = client.send(request)
        if r.status_code != 429:
            return r
        wait = int(r.headers.get("Retry-After", "1"))
        # exponential backoff capped at Retry-After * 4, plus jitter
        delay = min(wait * (2 ** attempt), wait * 4)
        delay += random.uniform(0, delay * 0.5)
        time.sleep(delay)
    return r  # caller decides what to do after exhaustion

SDKs auto-throttle

Official SDKs read RateLimit-Policy at runtime and pace requests internally so the application code doesn't have to. If you're writing your own client, parsing the policy is straightforward:

// RateLimit-Policy: 20;w=60;name="endpoint"
// → limit 20, window 60s, named "endpoint"

What you see when a guard fires

Both guards return HTTP 429 with the standard envelope. The error.code tells you which one:

429 Too Many Requests

RateLimit-Limit:     20
RateLimit-Remaining: 0
RateLimit-Reset:     42
Retry-After:         42

{
  "success": false,
  "error": {
    "code":    "RATE_TPS_EXCEEDED",
    "message": "You have exceeded the allowed request rate for this endpoint."
  }
}

Asking for a higher limit

If your traffic pattern legitimately exceeds the per-endpoint TPS (bulk imports, marketing campaigns, batch onboarding), talk to your organization manager. We can raise the per-organization limit for a specific endpoint without touching the DDoS floor. Bring:

Expected peak QPS and total daily volume.
Whether the spike is one-off (a launch) or sustained.
How you back off when limited (so we know the limit increase won't just shift the problem).