# Inference
Latest news and articles about Inference
Total: 6 articles found

The Great Recalibration: Why the GPU’s Hegemony in AI is Finally Cracking
As AI shifts from the training phase to mass deployment, the industry is moving away from GPU-centricity toward system-level efficiency. The resurgence of the CPU, driven by the needs of inference and AI Agents, is fundamentally changing the architecture of data centers and the competitive landscape for hardware giants like Intel, AMD, and Nvidia.

Shanghai’s Silicon Stealth: Zhongzixing Prepares to Tape Out AI-Native 'NEU' Chip
Shanghai startup Zhongzixing Technology plans a Q4 tape-out for its NEU AI chip, claiming massive performance and energy efficiency gains over traditional GPUs. Led by veterans from Intel and Nvidia, the firm represents China's push for specialized, high-performance AI silicon.

The Token Squeeze: Tesla’s New Silicon and Tencent’s Price Hikes Signal AI's Move to the Edge
Tesla's AI5 chip tape-out and Tencent Cloud's price hikes signal a strategic shift in the AI industry from cloud-based training to edge-side inference and high-volume token consumption. This transition is driving a revaluation of the AI supply chain, moving from raw hardware rental to sophisticated 'Agent-as-a-Service' business models.

The H100’s Second Act: Why NVIDIA’s Legacy Silicon is Seeing a 40% Rental Surge
NVIDIA’s four-year-old H100 GPUs are seeing a 40% price surge in the rental market due to a massive spike in demand for video generation and multi-agent AI systems. Despite the launch of newer Blackwell chips, supply remains critically tight, forcing AI giants to lock in long-term contracts for legacy silicon.

The Inference Pivot: China Surpasses U.S. in Weekly AI Token Consumption
China has overtaken the United States in weekly AI token consumption, reaching 4.69 trillion tokens and claiming the top three most-called models globally. Projections from J.P. Morgan suggest a massive 370-fold growth in Chinese AI inference by 2030, signaling a definitive shift toward large-scale industrial application.

Huang’s GTC Playbook: NVIDIA Repackages AI as Token Factories — Hardware, Agents and a $1tn Inference Bet
At GTC Huang declared a structural shift from training to inference, unveiling a hardware and software roadmap — Vera Rubin systems, Groq LPU integration, Kyber racks, and OpenClaw/NemoClaw agent frameworks — he says could create at least $1 trillion in revenue by 2027. The announcements reframe AI as a token‑generation business that will reshape data centre design, software stacks and corporate IT strategy.