Zhipu Limits Sales of GLM Coding Plan to Protect Long‑standing Users After GLM‑4.7 Demand Spike

Zhipu has limited daily sales of its GLM Coding Plan to 20% of previous volumes as a temporary measure after GLM‑4.7 triggered heavy demand that strained compute resources and slowed model responses during peak hours. The cap, beginning Jan 23 and refreshed daily, aims to protect existing users while Zhipu expands capacity and tightens control over malicious traffic.

A vibrant workspace featuring digital sketching on a tablet and code on a monitor, showcasing a tech-savvy environment.

Key Takeaways

  • 1Zhipu will cap new daily sales of the GLM Coding Plan at 20% of current levels starting Jan 23, refreshed daily at 10:00; auto‑renewals remain unaffected.
  • 2Surge in users after the launch of GLM‑4.7 caused concurrency throttling and slower responses during weekday peak hours (15:00–18:00).
  • 3Zhipu has started capacity expansion and will step up enforcement against malicious accounts that consume compute unfairly.
  • 4The company plans further model and infrastructure upgrades and says a better GLM Coding Plan will be released in due course.
  • 5The move highlights a broader industry challenge: demand for AI developer tooling outstrips available inference compute, forcing trade‑offs between growth and service quality.

Editor's
Desk

Strategic Analysis

This episode exposes compute scarcity as an operational choke point that will shape the competitive landscape for providers of paid LLM services. By prioritising existing customers, Zhipu is making a calculated trade‑off: sacrificing short‑term sales to preserve user experience and brand credibility. In the medium term expect incumbent and emerging firms to pursue three levers — vertical integration with cloud and chip suppliers, aggressive optimisation of inference efficiency, and more explicit access tiers (premium/reserved capacity for loyal customers). For enterprise buyers and developers, reliance on a single provider is increasingly risky; multi‑provider strategies and contractual SLAs will become more valuable. For investors, the demand spike is a bullish sign of product‑market fit, but sustained monetisation will depend on tangible progress in scaling infrastructure and lowering per‑call compute costs.

China Daily Brief Editorial
Strategic Insight
China Daily Brief

Chinese AI firm Zhipu announced on January 21 that it will temporarily restrict sales of its paid GLM Coding Plan after a surge in users following the rollout of GLM‑4.7 strained the company’s compute resources. The firm said some users experienced concurrency throttling and slower model responses during weekday peak hours (15:00–18:00), prompting an immediate capacity expansion and a short‑term sales cap.

From January 23 at 10:00, Zhipu will limit daily new sales of the Coding Plan to 20% of the current daily volume, with the quota refreshed every day at 10:00; existing automatic renewals will continue unaffected. The company framed the measure as a way to ‘‘prioritise our old friends,’’ signalling an explicit effort to protect long‑standing developers and paying users from degradation of service while backend upgrades proceed.

Zhipu also said it will intensify detection and suppression of malicious traffic that unfairly consumes compute resources. At the same time the firm reiterated plans to accelerate both compute expansion and model development, promising an improved GLM Coding Plan ‘‘soon.’’ The company did not provide a date for when the temporary sales cap will be lifted.

The announcement is notable for two linked reasons: it underlines the rapid user demand for developer‑focused large language models in China, and it highlights a persistent bottleneck in AI deployment — access to inference compute. High demand for capabilities such as code generation and programming assistance is common across the industry, but when capacity is limited providers must choose between selling more access and preserving quality for existing customers.

For developers and enterprises dependent on Zhipu’s stack, the move is a short‑term inconvenience and a reminder of operational risk in relying on a single provider. For Zhipu, the decision trades potential near‑term revenue for user retention and reputational management; by shielding existing customers from congestion the company seeks to prevent churn and public complaints that could undermine a newly popular product.

Strategically, the episode is a microcosm of the wider market dynamic: demand is outpacing the ability to supply low‑latency inference at scale, pushing AI firms toward heavier investment in data‑centre capacity, partnerships with cloud and chip vendors, and engineering work on inference efficiency. How Zhipu manages this bottleneck will influence its standing among developers and its ability to monetise GLM models as competition intensifies both inside China and globally.

Share Article

Related Articles

📰
No related articles found