# AI inference

Latest news and articles about AI inference

Total: 7 articles found

Detailed close-up of a laptop keyboard featuring Intel Core i7 and NVIDIA GeForce stickers, highlighting technology components.
Technology

Why Jensen Huang Is Betting Nvidia Will Turn AI Chips Into a $1 Trillion Business — and Why It’s Not a Done Deal

At GTC 2026 Jensen Huang forecast that Nvidia’s Blackwell and Rubin GPU families will generate at least $1 trillion of cumulative revenue by the end of 2027, excluding CPUs and rack systems. His case rests on visible hyperscaler bookings, a structural shift from training to inference demand, and a platform strategy selling full data‑centre systems; but tight timelines, packaging bottlenecks and rising competition from AMD and hyperscaler custom chips pose significant risks.

NeMo2026年3月18日 12:31
#Nvidia#Jensen Huang#AI inference
Portrait of an indigenous man in the Amazon, showcasing traditional face paint.
Technology

Amazon Taps Cerebras for Cloud Inference Push, Taking Aim at Nvidia’s Dominance

AWS will deploy Cerebras inference chips alongside its Trainium3 processors in a new service aimed at faster, cheaper AI inference for chatbots and coding tools. The move reflects a market shift from GPU‑heavy training towards specialised, lower‑latency inference hardware and intensifies competition with Nvidia’s GPU ecosystem.

NeTe2026年3月13日 22:38
#Amazon#Cerebras#AWS
Image displaying DeepSeek AI interface for messaging and search functionality.
Technology

DeepSeek’s DualPath Promises to Halve AI Inference Costs — But Questions Remain

DeepSeek has introduced DualPath, an inference architecture it says can double efficiency and lower the compute cost of running large AI models. The move reflects a broader industry shift toward software and architectural optimisations that could reduce reliance on cutting‑edge chips, but real‑world validation and integration challenges remain.

NeTe2026年2月28日 17:37
#DeepSeek#DualPath#AI inference
Close-up of wooden Scrabble tiles spelling 'China' and 'Deepseek' on a wooden surface.
Technology

Nvidia Targets the ‘Inference’ Bottleneck with a New Generation of AI Chips

Nvidia is designing a new class of chips optimized for AI inference, prioritizing latency, throughput and energy efficiency for real‑time model serving. The move aims to lower the cost of running large models at scale and strengthens Nvidia’s position across the AI value chain while intensifying competitive and geopolitical pressures in the semiconductor industry.

NeTe2026年2月28日 05:47
#Nvidia#AI inference#chips
Detailed close-up of a laptop keyboard featuring Intel Core i7 and NVIDIA GeForce stickers, highlighting technology components.
Technology

Nvidia’s $20bn Bet on ‘Extreme’ Inference Chips Signals a Shift from Training to Cheap, High‑Throughput AI

Nvidia’s roughly $20 billion acquisition of Groq’s technology and team marks a strategic bet that AI’s commercial future lies in low‑cost, high‑throughput inference rather than giant training clusters. Chinese startups and spin‑outs are racing to produce specialized inference chips, aiming to slash per‑token costs and capture regional markets as AI applications scale rapidly.

NeTe2026年2月1日 02:30
#Nvidia#Groq#AI inference
Modern abstract 3D render showcasing a complex geometric structure in cool hues.
Technology

Alibaba Unveils Qwen3‑Max‑Thinking, a Trillion‑Parameter Inference Model Aimed at Beating Western Rivals

Alibaba has released Qwen3‑Max‑Thinking, a trillion‑parameter inference model it says surpasses leading Western models on multiple benchmarks, with stronger agent tool‑calling and reduced hallucinations. The company is opening trials on PC and web, positioning the model for broad commercial use while leaving independent verification of its claims outstanding.

NeTe2026年1月26日 16:10
#Alibaba#Qwen3‑Max‑Thinking#large language model
Intricate fresco and ornate designs on the ceiling of Steingaden Church, BY, Germany.
Technology

vLLM Team's Inferact Secures $150m Seed at $800m Valuation, Signalling Fresh Bet on AI Inference Infrastructure

Inferact, founded by the creators of open‑source vLLM, raised $150 million in a seed round at an $800 million valuation led by Andreessen Horowitz and Lightspeed. The deal signals strong investor conviction in companies that can commercialize efficient LLM inference, but Inferact will face competition from cloud providers and specialized rivals as it seeks to translate open‑source credibility into enterprise revenue.

NeTe2026年1月23日 10:20
#Inferact#vLLM#Andreessen Horowitz