China’s AI Models Overtake US Usage — Rewriting the Hardware Playbook and Roiling Markets

OpenRouter data shows Chinese AI models surpassed US counterparts in weekly token usage in February 2026, driven by multiple efficient architectures and low costs. Markets split: Chinese cloud, data‑centre and power stocks surged while Nvidia’s valuation plunged, signalling a potential reallocation of AI value away from high‑end GPUs toward models and low‑cost infrastructure.

A close-up view of a person holding an Nvidia chip with a gray background.

Key Takeaways

  • 1OpenRouter data: Chinese models exceeded US models in weekly token usage in February 2026 and occupy four of the top five global slots.
  • 2Chinese models often use Mixture‑of‑Experts and other efficiency techniques that sharply reduce per‑token compute and memory needs.
  • 3Per‑token pricing for several Chinese models is an order of magnitude lower than some Western counterparts; lower electricity costs in western China further improve economics.
  • 4Markets reacted divergently: A‑share and Hong Kong cloud, data‑centre and power stocks rallied while Nvidia and other chipmakers fell amid concerns over sustainable GPU demand.
  • 5The shift suggests AI value may migrate from hardware vendors toward efficient model providers and low‑cost compute operators, prompting strategic responses from Western firms and regulators.

Editor's
Desk

Strategic Analysis

This episode marks a practical pivot in the AI ecosystem: performance no longer maps cleanly to sheer GPU scale. Chinese model makers have combined architectural innovations, competitive pricing and favourable energy economics to win global developer adoption, which is now translating into demand for domestic compute services. The longer‑term outcome depends on several variables — continued model quality, cross‑border access to cloud infrastructure, supply‑chain constraints for advanced chips, and geopolitical policy. If the trend endures, it will reshape investment priorities in AI, incentivise software‑centric optimisation across the industry, and intensify competition over where the economic rents from AI inference accrue.

China Daily Brief Editorial
Strategic Insight
China Daily Brief

A striking market divergence played out this week as Chinese AI models overtook the United States in global weekly token usage, triggering a rally in domestic cloud, data‑center and power stocks while one of Silicon Valley’s most prized companies, Nvidia, lost nearly $260bn in market value in a single session.

Data from OpenRouter, an API‑aggregation platform used predominantly by overseas developers, shows Chinese models’ weekly token calls rising from 4.12 trillion in early February to 5.16 trillion by the third week of the month, overtaking US models and accounting for four of the five most‑used models globally. That surge has been framed not as a one‑off hit but as a clustered rise: multiple Chinese models, many open‑source or hybrid architectures, are attracting international developers with much lower per‑token costs.

The market reaction was immediate and local. On 27 February A‑share and Hong Kong cloud, compute rental and data‑centre names surged, with multiple stocks hitting daily limits. The rally contrasted with a sharp sell‑off in US chipmakers after Nvidia’s earnings beat, as investors shifted their focus from short‑term revenue beats to longer‑term questions about the durability and distribution of AI compute demand.

At the heart of the shift is a technical and economic argument: several Chinese large models increasingly use Mixture‑of‑Experts (MoE) and other efficiency techniques that activate only parts of a network for each request. MoE models can cut memory and energy needs dramatically — one set of industry metrics suggests up to 60% lower VRAM and up to 19x throughput improvements in inference — weakening the previously straightforward equation that more tokens require proportionally more high‑end GPUs.

Cost differentials amplify the effect. Benchmark prices posted on OpenRouter show some Chinese models charging roughly $0.30 per million tokens for inputs versus $5 per million for certain Western counterparts. Combined with lower electricity costs in parts of western China, where renewable power can be significantly cheaper than in Europe or North America, the unit economics of serving global developer demand from Chinese infrastructure has become compelling.

The implications are strategic. If token demand can scale without a linear increase in top‑tier GPU purchases, the biggest beneficiaries may be model providers and low‑cost infrastructure operators rather than GPU vendors. Chinese compute vendors, cloud providers and data‑centre operators stand to capture more of the value chain, especially if developers and start‑ups continue to “vote with their feet” for lower‑cost, high‑throughput models.

That does not mean Nvidia or Western cloud providers are obsolete. High‑end accelerators remain essential for many training workloads and for certain inference tasks that require dense activation of parameters. But the market is beginning to price a bifurcation: model architectures and software engineering can substitute for raw GPU scale in many production scenarios, and commercial success will depend on a mix of cost, latency, regulatory access and model capability.

For investors and policymakers the key questions are how sustainable this redistribution of demand will be and what responses it will provoke. Western hyperscalers and chipmakers may accelerate software optimisations, diversify into alternative accelerators, or seek deeper partnerships with efficient model providers. Governments will watch closely: export controls, cross‑border data flows and energy policy will shape who ultimately benefits from the next phase of AI commercialisation.

Share Article

Related Articles

📰
No related articles found