# AI inference
Latest news and articles about AI inference
Total: 7 articles found

Why Jensen Huang Is Betting Nvidia Will Turn AI Chips Into a $1 Trillion Business — and Why It’s Not a Done Deal
At GTC 2026 Jensen Huang forecast that Nvidia’s Blackwell and Rubin GPU families will generate at least $1 trillion of cumulative revenue by the end of 2027, excluding CPUs and rack systems. His case rests on visible hyperscaler bookings, a structural shift from training to inference demand, and a platform strategy selling full data‑centre systems; but tight timelines, packaging bottlenecks and rising competition from AMD and hyperscaler custom chips pose significant risks.

Amazon Taps Cerebras for Cloud Inference Push, Taking Aim at Nvidia’s Dominance
AWS will deploy Cerebras inference chips alongside its Trainium3 processors in a new service aimed at faster, cheaper AI inference for chatbots and coding tools. The move reflects a market shift from GPU‑heavy training towards specialised, lower‑latency inference hardware and intensifies competition with Nvidia’s GPU ecosystem.

DeepSeek’s DualPath Promises to Halve AI Inference Costs — But Questions Remain
DeepSeek has introduced DualPath, an inference architecture it says can double efficiency and lower the compute cost of running large AI models. The move reflects a broader industry shift toward software and architectural optimisations that could reduce reliance on cutting‑edge chips, but real‑world validation and integration challenges remain.

Nvidia Targets the ‘Inference’ Bottleneck with a New Generation of AI Chips
Nvidia is designing a new class of chips optimized for AI inference, prioritizing latency, throughput and energy efficiency for real‑time model serving. The move aims to lower the cost of running large models at scale and strengthens Nvidia’s position across the AI value chain while intensifying competitive and geopolitical pressures in the semiconductor industry.

Nvidia’s $20bn Bet on ‘Extreme’ Inference Chips Signals a Shift from Training to Cheap, High‑Throughput AI
Nvidia’s roughly $20 billion acquisition of Groq’s technology and team marks a strategic bet that AI’s commercial future lies in low‑cost, high‑throughput inference rather than giant training clusters. Chinese startups and spin‑outs are racing to produce specialized inference chips, aiming to slash per‑token costs and capture regional markets as AI applications scale rapidly.

Alibaba Unveils Qwen3‑Max‑Thinking, a Trillion‑Parameter Inference Model Aimed at Beating Western Rivals
Alibaba has released Qwen3‑Max‑Thinking, a trillion‑parameter inference model it says surpasses leading Western models on multiple benchmarks, with stronger agent tool‑calling and reduced hallucinations. The company is opening trials on PC and web, positioning the model for broad commercial use while leaving independent verification of its claims outstanding.

vLLM Team's Inferact Secures $150m Seed at $800m Valuation, Signalling Fresh Bet on AI Inference Infrastructure
Inferact, founded by the creators of open‑source vLLM, raised $150 million in a seed round at an $800 million valuation led by Andreessen Horowitz and Lightspeed. The deal signals strong investor conviction in companies that can commercialize efficient LLM inference, but Inferact will face competition from cloud providers and specialized rivals as it seeks to translate open‑source credibility into enterprise revenue.