Chinese AI firm Zhipu announced on January 21 that it will temporarily restrict sales of its paid GLM Coding Plan after a surge in users following the rollout of GLM‑4.7 strained the company’s compute resources. The firm said some users experienced concurrency throttling and slower model responses during weekday peak hours (15:00–18:00), prompting an immediate capacity expansion and a short‑term sales cap.
From January 23 at 10:00, Zhipu will limit daily new sales of the Coding Plan to 20% of the current daily volume, with the quota refreshed every day at 10:00; existing automatic renewals will continue unaffected. The company framed the measure as a way to ‘‘prioritise our old friends,’’ signalling an explicit effort to protect long‑standing developers and paying users from degradation of service while backend upgrades proceed.
Zhipu also said it will intensify detection and suppression of malicious traffic that unfairly consumes compute resources. At the same time the firm reiterated plans to accelerate both compute expansion and model development, promising an improved GLM Coding Plan ‘‘soon.’’ The company did not provide a date for when the temporary sales cap will be lifted.
The announcement is notable for two linked reasons: it underlines the rapid user demand for developer‑focused large language models in China, and it highlights a persistent bottleneck in AI deployment — access to inference compute. High demand for capabilities such as code generation and programming assistance is common across the industry, but when capacity is limited providers must choose between selling more access and preserving quality for existing customers.
For developers and enterprises dependent on Zhipu’s stack, the move is a short‑term inconvenience and a reminder of operational risk in relying on a single provider. For Zhipu, the decision trades potential near‑term revenue for user retention and reputational management; by shielding existing customers from congestion the company seeks to prevent churn and public complaints that could undermine a newly popular product.
Strategically, the episode is a microcosm of the wider market dynamic: demand is outpacing the ability to supply low‑latency inference at scale, pushing AI firms toward heavier investment in data‑centre capacity, partnerships with cloud and chip vendors, and engineering work on inference efficiency. How Zhipu manages this bottleneck will influence its standing among developers and its ability to monetise GLM models as competition intensifies both inside China and globally.
