Rate Limits

Request limits and quotas

Request limits are applied to ensure fair use of the API.

Default Limits

Limit Type Value
Per-Minute Requests 60 requests/minute
Daily Requests 10,000 requests/day
Maximum Tokens Model-dependent

Rate Limit Headers

Rate limit information is returned as headers in every API response:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706000060

Per-Model Token Limits

For partner project and customer API keys, separate daily/monthly token and request limits can be defined for each model. These limits are configured from the Usage Analysis modal in the partner portal.

Limit TypeDescription
Daily TokenTotal token usage within 24 hours for a specific model. Reset: every night at 00:00.
Monthly TokenTotal tokens within the month for a specific model. Reset: the 1st of the month at 00:00.
Daily RequestsTotal request count within 24 hours for a specific model.
Model BlockingThe use of a specific model can be blocked entirely.

Model pattern examples:

warning

Priority Order:

Exact match > wildcard > general (*). If a model has a specific limit defined, that is used.

Managing Limits

tips_and_updates

Tips:

  • Apply exponential backoff when you get a 429 error
  • Spread requests over time for batch operations
  • Track the remaining limit from the headers
  • If you get the same error for too many models, a per-model limit may be defined — ask your administrator
Want help finding the right model?