Amazon Web Services has struck a strategic partnership with Cerebras Systems to integrate the latter’s AI inference chips into AWS data centres, joining them with Amazon’s in‑house Trainium3 processors and a custom networking layer. The collaboration is designed to accelerate conversational AI, code assistants and other latency‑sensitive applications by splitting the inference pipeline between two specialised chips.
Cerebras, valued at roughly $23.1 billion, has positioned itself as an architectural alternative to Nvidia’s GPUs and earlier this year signed a sizeable chip‑supply agreement with OpenAI. Under the AWS plan Cerebras will handle the decoding phase of inference — the step in which a trained model generates human‑readable answers — while Trainium3 will manage the pre‑processing or tokenisation stage. Amazon and Cerebras have not disclosed the financial terms or the scale of the deployment, though Amazon expects the service to be available in the second half of the year.
The deal underscores a broader shift in the AI market from training‑centric compute to inference‑focused needs. Large‑scale model training has long been dominated by GPUs because of their raw throughput, but companies running conversational agents now prioritise lower latency and cost‑efficient inference as user bases swell into the millions. AWS’s approach — pairing its custom Trainium chips with Cerebras hardware and optimising the network fabric between them — is explicitly pitched as a price‑performance alternative to GPU‑centric stacks.
The timing is notable: Nvidia, the incumbent supplier to most cloud providers, is preparing its own hybrid approaches. Reports indicate Nvidia will outline plans to combine its GPUs with chipsets from startups such as Groq, a company Nvidia moved on late last year with a large acquisition. AWS is making a point of saying its Trainium3 project is close to production‑grade workloads and that its stack will deliver favourable economics versus market GPUs, though independent benchmarks will be required to substantiate those claims.
For AWS the partnership is both competitive and strategic. It reduces reliance on a single supplier in a market where chip availability, pricing and performance are increasingly strategic assets. For Cerebras, access to AWS’s global customer base means rapid commercial scale and a simpler route for enterprises and startups to trial an alternative inference architecture with a single click, as CEO Andrew Feldman put it.
The cumulative effect may be a more heterogeneous cloud compute landscape. If cloud providers pair domain‑specific inference silicon with existing GPU ecosystems, customers will have richer choices for balancing latency, throughput and cost. Nvidia’s ecosystem advantages — software stacks, developer familiarity and broad market share — remain formidable, but Amazon’s tie‑up with Cerebras signals that competitors believe there is fertile ground to erode GPU exclusivity in the inference era.
