Rate Limits

Request limits and quotas

Request limits are applied to ensure fair use of the API.

Default Limits

Limit Type	Value
Per-Minute Requests	60 requests/minute
Daily Requests	10,000 requests/day
Maximum Tokens	Model-dependent

Rate Limit Headers

Rate limit information is returned as headers in every API response:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1706000060
            

Per-Model Token Limits

For partner project and customer API keys, separate daily/monthly token and request limits can be defined for each model. These limits are configured from the Usage Analysis modal in the partner portal.

Limit Type	Description
Daily Token	Total token usage within 24 hours for a specific model. Reset: every night at 00:00.
Monthly Token	Total tokens within the month for a specific model. Reset: the 1st of the month at 00:00.
Daily Requests	Total request count within 24 hours for a specific model.
Model Blocking	The use of a specific model can be blocked entirely.

Model pattern examples:

gpt-4o — exact match, only this model
google/* — wildcard, all of Google's models
* — all (a general limit for all models)

warning

Priority Order:

Exact match > wildcard > general (*). If a model has a specific limit defined, that is used.

Managing Limits

tips_and_updates

Tips:

Apply exponential backoff when you get a 429 error
Spread requests over time for batch operations
Track the remaining limit from the headers
If you get the same error for too many models, a per-model limit may be defined — ask your administrator