Current Rate Limits

API usage is limited by concurrency (i.e., the number of in-flight requests). Below are the current rate limits for each model.

Loading Rate limits...

Explanation of Rate Limits

To ensure stable access to GLM-4-Flash during the free trial, requests with context lengths over 8K will be throttled to 1% of the standard concurrency limit.

Loading...