When Memory Rules: How HBM Is Rewriting the Economics of AI Chips

The AI chip competition has pivoted from raw compute to memory capacity and bandwidth as HBM and advanced packaging now dominate costs and performance requirements. Persistent HBM shortages and soaring prices favour cloud buyers who prioritise memory-rich GPUs and push chipmakers toward software and system optimisations to reduce memory demand.

Pile of disassembled circuit boards on a blue surface. Perfect for tech themes.

Key Takeaways

  • 1HBM and CoWoS packaging now account for the largest portion of high-end GPU card costs, often surpassing the GPU die itself.
  • 2Large models and long-context agent workloads have driven a sharp rise in demand for both model-weight storage and large KV/activation caches.
  • 3Major memory manufacturers have shifted capacity toward HBM and high-performance DDR, tightening supply for consumer DRAM and GDDR.
  • 4New high-end fabs will ease but not quickly reverse HBM shortages; data‑centre demand will be prioritised over consumer markets.
  • 5Chinese AI-chip firms can gain advantage through software/system engineering (quantisation, expert-loading, KV optimisations) to lower memory needs.

Editor's
Desk

Strategic Analysis

The ‘memory-first’ moment reshapes incentives across the AI stack. In the near term, cloud providers that secure HBM-rich inventory will have a competitive edge for agent and long-context applications, and incumbents with supply relationships will see their market power reinforced. For China, the most pragmatic route is not an immediate, capital-intensive race to match global HBM capacity but a two-pronged strategy: accelerate domestic advanced packaging and DRAM capabilities where feasible, while mobilising engineering talent to squeeze memory efficiency from models and runtimes. Over the medium term, pressure on HBM supply will stimulate architectural diversity — sparse and mixture-of-experts models, specialised memory-centric accelerators, and software-first deployment pipelines — any of which could level the playing field for smaller chipmakers and cloud operators that can innovate at the system level.

China Daily Brief Editorial
Strategic Insight
China Daily Brief

The AI hardware race has quietly flipped. Where cloud buyers once compared GPU teraflops, procurement conversations in 2026 increasingly revolve around how much high-bandwidth memory (HBM) a card carries and how wide its interface is — a shift industry insiders sum up as “buy memory, get the core for free.” Engineers and supply-chain managers now say HBM and advanced packaging (CoWoS) can account for the single largest line items on a high-end accelerator bill of materials, eclipsing the GPU die itself.

That change is a direct consequence of how large models are applied today. Since the ChatGPT breakthrough of 2023, model sizes ballooned and deployment patterns evolved: beyond pure training, inference workloads driven by agents and long-context use cases have become mainstream. These applications increase two distinct forms of memory demand — persistent storage for model weights and volatile, low-latency caches (KV Cache and activations) sized to support sessions measured in hundreds of thousands to a million tokens.

HBM is attractive because it integrates stacked DRAM very close to the compute die with wide buses, dramatically raising effective bandwidth and reducing latency compared with traditional off-package memory. But that performance comes at a price. Advanced 2.5D/3D packaging such as CoWoS introduces its own costs and yield challenges. One vendor example cited in this reporting shows HBM and CoWoS together making up roughly 80% of the manufacturing cost of a high-end inference/training card, with the GPU logic chip representing a surprisingly small share.

On the supply side, manufacturers have responded by reallocating DRAM capacity. Samsung, SK Hynix and Micron have diverted significant wafer starts toward HBM and high-performance DDR5 to meet cloud and AI demand, reducing output for consumer GDDR and commodity DRAM. The market effect is visible: memory spot prices surged from mid‑2025 and remained elevated into 2026, and consumer graphics cards and mobile devices are already feeling the pinch as foundries prioritise high-margin HBM production.

Short of an abrupt demand reversal, additional fab capacity — even the upcoming Samsung P4L and SK Hynix M15x lines intended for HBM and high-end DDR — will primarily temper further price escalation rather than deliver steep discounts. Data-centre buyers and AI cloud operators will be first in line for new HBM wafers, while consumer markets are likely to see relief sooner but only if vendors reallocate capacity back to commodity parts. For Chinese AI-chip makers the imbalance presents both a challenge and an opening: with HBM scarce and costly, Chinese teams are doubling down on system-level and software engineering — low-bit quantisation, expert-selection loading, KV-cache optimisation and other deployment tricks — to reduce memory footprints and improve effective throughput.

The strategic implications are broad. Cloud economics have shifted: providers may prefer more memory-rich instances even if raw compute per dollar is lower, while chipmakers are pushed to rethink designs around memory capacity and bandwidth rather than peak FLOPS. The concentration of HBM supply with a handful of vendors creates geopolitical and commercial risk, increasing incentives to pursue alternative architectures, memory-efficient models, and domestic upstream investments in packaging and DRAM for actors looking to secure their AI roadmaps.

Share Article

Related Articles

📰
No related articles found