A surge in AI usage — driven by chatbots, autonomous agents and a new wave of consumer promotions — has sent token consumption and compute demand into overdrive, exposing bottlenecks in memory, GPU supply and data‑centre power. Large internet platforms and emergent desktop agents have accelerated the shift from search to conversational and always‑on AI, multiplying inference workloads and pushing companies to rethink where and how they deploy hardware.
The jump in tokens is dramatic and measurable. By late 2025, ByteDance’s “Doubao” model reportedly consumed more than 50 trillion tokens per day, a tenfold year‑on‑year rise, and Google disclosed monthly token processing on the order of 1.3 quadrillion — roughly 43 trillion a day. JPMorgan has noted that heavy promotional spending by tech giants is not just marketing: it is training users to run continuous conversational agents, which in turn multiplies downstream inference volumes and token burn rates.
That growth has a price. Memory and storage demand has rebounded sharply; high‑bandwidth memory (HBM) is a chokepoint for server makers and GPU prices have risen accordingly. Hardware inflation is feeding a fast‑growing market for rented AI compute. Firms that lease capacity report explosive expansion: one Chinese provider said its fleet scaled from about 2,000 GPU cards in early 2025 to over 10,000 today, and plans to reach 50,000 cards in 2026 as clients increasingly prefer rental to owning volatile hardware.
Cloud providers are responding with tighter supply and higher prices. Amazon raised EC2 machine‑learning capacity prices in January 2026 by around 15%, and Google Cloud announced price adjustments across AI and compute services effective May 2026. Investors and chipmakers are doubling down on infrastructure: Nvidia committed $2 billion to CoreWeave to accelerate the US company’s plan to add more than 5GW of AI compute capacity by 2030 — a validation of the high margins and strategic importance of GPU‑centric cloud infrastructure.
Rising power density and sustainability targets are tilting design choices toward liquid cooling. Chinese policy now places firm PUE (power usage effectiveness) thresholds on new and refurbished data centres — new large centres must meet PUE ≤ 1.25, renovated centres ≤ 1.5, with advanced facilities targeting ~1.1. Traditional air‑cooled racks typically run PUEs above 1.5; immersion and other liquid‑cooling solutions can push overall PUE down into the 1.1–1.2 range, making them an attractive route to meet both performance and regulatory requirements.
Immersion cooling is consolidating around a few coolant chemistries: fluorinated fluids, silicone (organosilicon) oils and synthetic hydrocarbons each have niches. Chinese materials companies and system integrators are piloting silicone‑based immersion plants; one commercial project in Hangzhou uses a silicone coolant to support racks with up to 210 kW density, demonstrating both reliability and economics at scale. Major GPU users abroad already deploy similar fluids — Nvidia, for example, has used Dow’s silicone coolants in high‑power installations.
The market is also shifting from training to inference. Early AI investment emphasized training rigs equipped with H100‑ or H200‑class GPUs; today’s consumer and enterprise applications are creating sustained, distributed inference demand that is better met by more cost‑efficient chips and by edge deployment. Chinese providers are therefore building nationwide edge nodes around population centres where usage intensity — and token burn — is highest, arguing that decentralised inference can be cheaper, faster and less tied to the limits of central data‑centre power.
Yet the transition is imperfect. Many so‑called “intelligent compute centres” are retrofits of older facilities; higher single‑cabinet power often clashes with legacy electricity and spatial provisioning, producing wasted capacity or stranded investments. As compute becomes more capital‑intensive and power‑constrained, the winners will be those that align cooling, materials supply, chip procurement and regulatory compliance into scalable, localised service models.
For global observers, the story matters because it maps where value and risk are concentrating in the AI supply chain. Liquid‑cooling chemistry makers, integrators and edge‑oriented infrastructure providers gain scope for rapid growth, while chip and memory scarcity will continue to shape pricing and geopolitics. Expect compute to remain both a bottleneck and a battleground as firms race to satisfy expanding inference demand without exceeding grids, budgets or decarbonisation mandates.
