J+

Get rid of ads & unlock exclusive premium content

Go premium

Julisha News Logo
HomeNewsBusinessPoliticsSportsTechnology
NEW
  • News
  • Business
  • Politics
  • Sports
  • Technology
    NEW
/

Get Premium Access

Subscribe to Julisha Premium for exclusive content, ad-free reading, and early access to breaking news.

Julisha IconJulisha

Your trusted source for comprehensive news coverage, bringing you accurate and timely stories from Kenya and around the globe.

Quick Links

NewsBusinessPoliticsSportsTechnologyNEW
Trending NowEditor's Picks

Company

About UsContact UsCareersAdvertise With UsPress Releases
123 Kenyatta Avenue, Nairobi
+254 700 000000
info@julisha.co.ke

Newsletter

Stay updated with our latest news and special offers.

Legal

Terms and ConditionsPrivacy PolicyCookie PolicyCopyright

© 2026 Julisha News. All rights reserved.

SitemapAccessibilityHelp Center

    More Articles Like This

    Join our growing community:

    Instagram• Join Community
    Facebook• Join Community
    WhatsApp• Join Community
    1. Home
    2. /
    3. technology

    Amazon taps Cerebras wafer-scale chips to turbocharge AI models on AWS

    Mar 15, 2026
    4 mins read
    Amazon taps Cerebras wafer-scale chips to turbocharge AI models on AWS

    Amazon Web Services said Friday it will put processors from Cerebras inside its data centers under a multiyear partnership focused on AI inference.

    The deal gives Amazon a new way to speed up how AI models answer prompts, write code, and handle live user requests. AWS said it will use Cerebras technology, including the Wafer-Scale Engine, for inference tasks.

    The companies did not share the financial terms. The setup is planned for Amazon Bedrock inside AWS data centers, putting the partnership right inside one of Amazon’s main AI products.

    AWS said the system will combine Amazon Trainium-powered servers, Cerebras CS-3 systems, and Amazon’s Elastic Fabric Adapter networking.

    Later this year, AWS also plans to offer leading open-source large language models and Amazon Nova on Cerebras hardware. David Brown, vice president of Compute and ML Services at AWS, said speed is still a major problem in AI inference, especially for real-time coding help and interactive apps.

    David said, “Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications.”

    Amazon splits prefill and decode across separate chips

    AWS said the design uses a method called inference disaggregation. That means splitting AI inference into two parts. The first part is prompt processing, also called prefill. The second part is output generation, also called decode.

    AWS said the two jobs behave very differently. Prefill is parallel, compute heavy, and needs moderate memory bandwidth. Decode is serial, lighter on compute, and much more dependent on memory bandwidth. Decode also takes most of the time in these cases because every output token has to be produced one by one.

    That is why AWS is assigning different hardware to each stage. Trainium will handle prefill. Cerebras CS-3 will handle decode.

    AWS said low-latency, high-bandwidth EFA networking will connect both sides so the system can work as one service while each processor focuses on a separate task.

    David said, “What we’re building with Cerebras solves that: by splitting the inference workload across Trainium and CS-3, and connecting them with Amazon’s Elastic Fabric Adapter, each system does what it’s best at. The result will be inference that’s an order of magnitude faster and higher performance than what’s available today.”

    AWS also said the service will run on the AWS Nitro System, which is the base layer for its cloud infrastructure.

    That means Cerebras CS-3 systems and Trainium-powered instances are expected to operate with the same security, isolation, and consistency that AWS customers already use.

    Amazon pushes Trainium harder as Nvidia faces another threat

    The announcement also gives Amazon another opening to push Trainium against chips from Nvidia, AMD, and other big chip companies. AWS describes Trainium as its in-house AI chip built for scalable performance and cost efficiency across training and inference.

    AWS said two major AI labs are already committed to it. Anthropic has named AWS its primary training partner and uses Trainium to train and deploy models. OpenAI will consume 2 gigawatts of Trainium capacity through AWS infrastructure for Stateful Runtime Environment, frontier models, and other advanced workloads.

    AWS added that Trainium3 has seen strong adoption since its recent release, with customers across industries committing major capacity.

    Cerebras is handling the decode side of the setup. AWS said CS-3 is dedicated to decoding acceleration, which gives it more room for fast output tokens. Cerebras says CS-3 is the world’s fastest AI inference system and delivers thousands of times greater memory bandwidth than the fastest GPU.

    The company said reasoning models now make up a larger share of inference work and generate more tokens per request as they work through problems. Cerebras also said OpenAI, Cognition, Mistral, and others use its systems for demanding workloads, especially agentic coding.

    Andrew Feldman, founder and chief executive of Cerebras Systems, said, “Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base.”

    Andrew added, “Every enterprise around the world will be able to benefit from blisteringly fast inference within their existing AWS environment.”

    The deal adds more pressure on Nvidia, which in December signed a $20 billion licensing agreement with Groq and plans next week to unveil a new inference system using Groq technology.

    Samsung Galaxy S26 Ultra Wireless Charging Stumbles
    technology
    Mar 20, 2026
    4 mins read

    Samsung Galaxy S26 Ultra Wireless Charging Stumbles

    Samsung Galaxy S26 Ultra Wireless Charging Stumbles

    Read article
    Google Advances Search AI Mode with Gemini 3 Flash
    technology
    Dec 17, 2025
    4 mins read

    Google Advances Search AI Mode with Gemini 3 Flash

    Google Advances Search AI Mode with Gemini 3 Flash

    Read article
    Galaxy S26 To Feature Custom Exynos 2600
    technology
    Nov 3, 2025
    4 mins read

    Galaxy S26 To Feature Custom Exynos 2600

    Galaxy S26 To Feature Custom Exynos 2600

    Read article
    Microsoft ends Windows 10 Support : Free Security Update Solutions
    technology
    Oct 14, 2025
    5 mins read

    Microsoft ends Windows 10 Support : Free Security Update Solutions

    Microsoft ends Windows 10 Support : Free Security Update Solutions

    Read article
    WhatsApp Gets Built-In Message Translation on iOS, Android
    technology
    Sep 23, 2025
    4 mins read

    WhatsApp Gets Built-In Message Translation on iOS, Android

    WhatsApp Gets Built-In Message Translation on iOS, Android

    Read article
    Microsoft Invests R5.4Bn to Expand AI Infrastructure in South Africa
    technology
    Mar 7, 2025
    2 mins read

    Microsoft Invests R5.4Bn to Expand AI Infrastructure in South Africa

    Microsoft Invests R5.4Bn to Expand AI Infrastructure in South Africa

    Read article
    How Remote Collaboration Tools Are Shaping Tomorrow’s Office
    technology
    Oct 17, 2024
    5 mins read

    How Remote Collaboration Tools Are Shaping Tomorrow’s Office

    Explore how remote collaboration tools like Slack, Trello, and virtual offices are shaping the future of work. Learn how these tools are enhancing communication, project management, and global teamwork, making the office of tomorrow more flexible and productive than ever before.

    Read article