Zhipu AI has announced a temporary, steep reduction in daily sales of its GLM Coding Plan after the rollout of GLM‑4.7 produced a sharp increase in user activity. Beginning January 23 at 10:00, the company will limit new daily sales to 20% of current levels and refresh that limited quota each day at 10:00, while leaving existing automatic renewals untouched. Zhipu cited intermittent concurrency errors and slow responses during weekday peak hours (15:00–18:00) as the immediate reason for the move, and said the suspension will remain until further notice.
The decision underscores the heavy compute demands of modern, code‑focused large language models and the operational strain they can place on vendors both large and small. GLM, Zhipu’s family of models, has been positioned as a domestic alternative to Western offerings, and GLM‑4.7 appears to have attracted a wave of new users that outpaced the firm’s capacity planning. Concurrency throttling and slower inference at peak times are common symptoms when model adoption grows faster than infrastructure upgrades can match.
For customers and developers the cap is a mixed signal: it protects service quality for existing subscribers but limits market access for new users and could slow onboarding. By prioritizing renewals and established customers, Zhipu is choosing short‑term stability over open growth, a pragmatic step that also reveals near‑term limits in its AI infrastructure or cloud provisioning strategy. Competitors and cloud providers will watch closely for signs of whether this is a temporary squeeze or a longer‑term capacity gap.
The episode is instructive about the economics of AI in China. Running large inference fleets remains expensive and technically complex, whether hosted on domestic clouds or self‑managed hardware. How Zhipu responds—by expanding compute, introducing tiered access, raising prices, or partnering for capacity—will shape developer sentiment and commercial traction for GLM‑branded models in a crowded Chinese AI market. Observers should track the duration of the cap, any changes to pricing or SLAs, and whether rivals seek to poach frustrated users.
