The model concurrency on this page is only applicable to API users with balance consumption. GLM Coding users please refer to the package benefits.
Current Rate Limits
API usage is limited by concurrency (i.e., the number of in-flight requests). Below are the current rate limits for each model.
Loading Rate limits...
Explanation of Rate Limits
To ensure stable access to GLM-4-Flash during the free trial, requests with context lengths over 8K will be throttled to 1% of the standard concurrency limit.
Loading...