At the pre-dawn keynote of GTC 2026, Nvidia chief Jensen Huang laid down an audacious financial marker: by 2027 Nvidia’s flagship compute chips will spawn $1 trillion in revenue. The announcement was backed by a broad product roll‑out — a new CPU, dedicated inference racks, optical interconnects, enterprise agent software and even a space compute module — that reframes Nvidia as a platform builder rather than a mere GPU vendor.
The threads that tied the show together were not individual product specs but a single architectural thesis: the old model of passive data storage and generic compute is giving way to “token factories” that produce AI inference outputs. Huang described a world in which AI nodes are optimised to manufacture tokens — the discrete units of generated text or decisions — and where engineering focus shifts to squeezing down the cost per token through tighter hardware–software co‑design.
That logic explains the Groq 3 LPX inference rack and Nvidia’s push into specialised processors. Rather than relying solely on general‑purpose GPUs for every stage of model inference, Nvidia now proposes splitting workloads: GPUs handle the heavy prefill of large models, while racks equipped with 256 LPUs, 128GB of on‑chip SRAM and enormous 640TB/s bandwidth handle low‑latency token decoding. Nvidia claims a 35‑fold improvement in throughput‑per‑watt for these inference tasks, a leap that, if realised at scale, would materially lower operating costs for generative AI services.
Huang’s stack is more than chips. Nvidia unveiled the Vera CPU tuned for agent workloads, the BlueField‑4 STX storage architecture and Spectrum‑6 SPX optical interconnects using co‑packaged optics (CPO). The company argues these elements double compute efficiency, speed up core processing by half again, boost optical power efficiency five‑fold and raise network reliability — essentially pre‑building a high‑bandwidth highway for the mammoth, persistent contexts required by enterprise agents.
Software and models close the loop. Nvidia introduced NemoClaw, an enterprise agent infrastructure promising one‑command deployments and built‑in internal networking and sandboxing for sensitive data. To feed that infrastructure it released Nemotron 3 Super, an open model of 120 billion parameters with 12 billion active parameters for agent execution, claiming five‑fold throughput gains. Graphics and rendering advances such as DLSS 5 were pitched as further proof that Nvidia intends to set de facto standards across compute, networking and application layers.
The company also unveiled Space‑1’s Vera Rubin space compute module, signalling ambitions to extend compute beyond terrestrial data centres into low Earth orbit. That move speaks to latency, coverage and resilience playbooks for future distributed AI services, and suggests Nvidia is thinking about compute availability as a global, multi‑layered utility.
Huang’s $1 trillion projection is as much strategic messaging as a financial forecast. To reach that figure Nvidia must not only sell chips but ensure cloud providers, enterprises and governments redesign infrastructure and workflows around its stack. The path forward is plausible given the surge in demand for generative AI, but it is neither inevitable nor uncontested: supply‑chain scale, rival platforms from hyperscalers and Chinese suppliers, and regulatory scrutiny over monopoly control of critical AI infrastructure are real constraints.
