AWS plans to deploy Cerebras' Wafer-Scale Engine chip for AI inference functions; AWS will still offer slower, cheaper computing using its Trainium processors

Amazon Web Services says the partnership will allow it to offer lightning-fast inference computing

Wall Street Journal 2026-03-13

Discussion

@sethwinterroth Seth Winterroth on x
Cerebras just landed AWS. That's @OpenAI and @awscloud in the span of 3 months. The AI inference stack is restructuring in real time and @cerebras is winning. https://www.wsj.com/...
@bgurley Bill Gurley on x
That's big. Really big. Whole wafer big.
@tbu12345678 @tbu12345678 on x
legit hysterical that Cerebras got a presser from AWS before $AMD
@awscloud @awscloud on x
We're teaming up with @cerebras to build the fastest possible inference. Coming soon to Amazon Bedrock, we're delivering inference performance an order of magnitude faster than what's available today by connecting AWS Trainium3 for compute-intensive prefill with Cerebras CS-3 [vi…
@ericvishria Eric Vishria on x
Breaking up prefill (processing the prompt) and decode (generating the response) has been theorized for a while as they have different compute requirements. Now we have the silicon to do it - AWS Trainium for prefill, and Cerebras for decode. Super fast AND cost effective.
@andrewdfeldman Andrew Feldman on x
Today Cerebras announced that @awscloud will be deploying Cerebras CS-3s in their data centers. Together, Cerebras and AWS will be delivering the fastest inference solution in the world. It has been an extraordinary 30 days for Cerebras. In February, we announced that we would [i…
@awsnewsroom @awsnewsroom on x
AWS and @cerebras are bringing dramatically faster AI inference to customers through Amazon Bedrock. The solution splits inference into two stages: AWS Trainium3 for prompt processing and Cerebras CS-3 for output generation. AWS will be the first and exclusive cloud provider to […

Chronicles

AWS plans to deploy Cerebras' Wafer-Scale Engine chip for AI inference functions; AWS will still offer slower, cheaper computing using its Trainium processors

Related Coverage

Discussion