Rate Limits
Request limits and quotas
Request limits are applied to ensure fair use of the API.
Default Limits
| Limit Type | Value |
|---|---|
| Per-Minute Requests | 60 requests/minute |
| Daily Requests | 10,000 requests/day |
| Maximum Tokens | Model-dependent |
Rate Limit Headers
Rate limit information is returned as headers in every API response:
X-RateLimit-Limit: 60 X-RateLimit-Remaining: 45 X-RateLimit-Reset: 1706000060
Per-Model Token Limits
For partner project and customer API keys, separate daily/monthly token and request limits can be defined for each model. These limits are configured from the Usage Analysis modal in the partner portal.
| Limit Type | Description |
|---|---|
| Daily Token | Total token usage within 24 hours for a specific model. Reset: every night at 00:00. |
| Monthly Token | Total tokens within the month for a specific model. Reset: the 1st of the month at 00:00. |
| Daily Requests | Total request count within 24 hours for a specific model. |
| Model Blocking | The use of a specific model can be blocked entirely. |
Model pattern examples:
gpt-4o— exact match, only this modelgoogle/*— wildcard, all of Google's models*— all (a general limit for all models)
Priority Order:
Exact match > wildcard > general (*). If a model has a specific limit defined, that is used.
Managing Limits
Tips:
- Apply exponential backoff when you get a 429 error
- Spread requests over time for batch operations
- Track the remaining limit from the headers
- If you get the same error for too many models, a per-model limit may be defined — ask your administrator