A striking market divergence played out this week as Chinese AI models overtook the United States in global weekly token usage, triggering a rally in domestic cloud, data‑center and power stocks while one of Silicon Valley’s most prized companies, Nvidia, lost nearly $260bn in market value in a single session.
Data from OpenRouter, an API‑aggregation platform used predominantly by overseas developers, shows Chinese models’ weekly token calls rising from 4.12 trillion in early February to 5.16 trillion by the third week of the month, overtaking US models and accounting for four of the five most‑used models globally. That surge has been framed not as a one‑off hit but as a clustered rise: multiple Chinese models, many open‑source or hybrid architectures, are attracting international developers with much lower per‑token costs.
The market reaction was immediate and local. On 27 February A‑share and Hong Kong cloud, compute rental and data‑centre names surged, with multiple stocks hitting daily limits. The rally contrasted with a sharp sell‑off in US chipmakers after Nvidia’s earnings beat, as investors shifted their focus from short‑term revenue beats to longer‑term questions about the durability and distribution of AI compute demand.
At the heart of the shift is a technical and economic argument: several Chinese large models increasingly use Mixture‑of‑Experts (MoE) and other efficiency techniques that activate only parts of a network for each request. MoE models can cut memory and energy needs dramatically — one set of industry metrics suggests up to 60% lower VRAM and up to 19x throughput improvements in inference — weakening the previously straightforward equation that more tokens require proportionally more high‑end GPUs.
Cost differentials amplify the effect. Benchmark prices posted on OpenRouter show some Chinese models charging roughly $0.30 per million tokens for inputs versus $5 per million for certain Western counterparts. Combined with lower electricity costs in parts of western China, where renewable power can be significantly cheaper than in Europe or North America, the unit economics of serving global developer demand from Chinese infrastructure has become compelling.
The implications are strategic. If token demand can scale without a linear increase in top‑tier GPU purchases, the biggest beneficiaries may be model providers and low‑cost infrastructure operators rather than GPU vendors. Chinese compute vendors, cloud providers and data‑centre operators stand to capture more of the value chain, especially if developers and start‑ups continue to “vote with their feet” for lower‑cost, high‑throughput models.
That does not mean Nvidia or Western cloud providers are obsolete. High‑end accelerators remain essential for many training workloads and for certain inference tasks that require dense activation of parameters. But the market is beginning to price a bifurcation: model architectures and software engineering can substitute for raw GPU scale in many production scenarios, and commercial success will depend on a mix of cost, latency, regulatory access and model capability.
For investors and policymakers the key questions are how sustainable this redistribution of demand will be and what responses it will provoke. Western hyperscalers and chipmakers may accelerate software optimisations, diversify into alternative accelerators, or seek deeper partnerships with efficient model providers. Governments will watch closely: export controls, cross‑border data flows and energy policy will shape who ultimately benefits from the next phase of AI commercialisation.
