Red Hat expands partnership with Amazon Web Services for enhanced enterprise-scale generative artificial intelligence

Matt Hicks, President and Chief Executive Officer
Matt Hicks, President and Chief Executive Officer
0Comments

Red Hat has announced an expanded collaboration with Amazon Web Services (AWS) to enhance enterprise-grade generative AI capabilities on AWS. The partnership will integrate Red Hat AI with AWS’s purpose-built AI chips, including Inferentia2 and Trainium3, aiming to provide organizations with more flexibility and efficiency in running high-performance AI inference workloads at scale.

According to the companies, this effort addresses the growing need for scalable AI inference solutions as organizations increasingly adopt generative AI technologies. A forecast by IDC suggests that by 2027, 40% of organizations will use custom silicon such as ARM processors or chips designed specifically for AI and machine learning tasks. This trend reflects a demand for improved processing power, cost efficiency, and specialized computing.

Key components of the collaboration include enabling Red Hat’s AI Inference Server—powered by vLLM—to operate on AWS Inferentia2 and Trainium3 chips. This aims to create a unified inference layer capable of supporting various generative AI models while delivering up to 30-40% better price performance compared to current GPU-based Amazon EC2 instances.

Additionally, Red Hat has worked with AWS to develop an AWS Neuron operator for its OpenShift platforms. This provides customers using Red Hat OpenShift, OpenShift AI, or OpenShift Service on AWS with a streamlined way to run their AI workloads using AWS accelerators. The collaboration also includes easier access to high-capacity accelerators through support for AWS chips and new automation tools like the amazon.ai Certified Ansible Collection for orchestrating AI services on AWS.

The companies are further contributing upstream by optimizing an AWS AI chip plugin integrated into vLLM. As the leading commercial contributor to vLLM, Red Hat is focused on enhancing both inference and training capabilities for users of these open source frameworks.

Joe Fernandes, vice president and general manager of Red Hat’s AI Business Unit said: “By enabling our enterprise-grade Red Hat AI Inference Server, built on the innovative vLLM framework, with AWS AI chips, we’re empowering organizations to deploy and scale AI workloads with enhanced efficiency and flexibility. Building on Red Hat’s open source heritage, this collaboration aims to make generative AI more accessible and cost-effective across hybrid cloud environments.”

Colin Brace, vice president at Annapurna Labs (AWS), added: “Enterprises demand solutions that deliver exceptional performance, cost efficiency, and operational choice for mission-critical AI workloads. AWS designed its Trainium and Inferentia chips to make high-performance AI inference and training more accessible and cost-effective. Our collaboration with Red Hat provides customers with a supported path to deploying generative AI at scale, combining the flexibility of open source with AWS infrastructure and purpose-built AI accelerators to accelerate time-to-value from pilot to production.”

Jean-François Gamache from CAE commented: “Modernizing our critical applications with Red Hat OpenShift Service on AWS marks a significant milestone in our digital transformation. This platform supports our developers in focusing on high-value initiatives – driving product innovation and accelerating AI integration across our solutions. Red Hat OpenShift provides the flexibility and scalability that enable us to deliver real impact, from actionable insights through live virtual coaching to significantly reducing cycle times for user-reported issues.”

Anurag Agrawal of Techaisle stated: “As AI inference costs escalate, enterprises are prioritizing efficiency alongside performance. This collaboration exemplifies Red Hat’s ‘any model, any hardware’ strategy by combining its open hybrid cloud platform with the distinct economic advantages of AWS Trainium and Inferentia. It empowers CIOs to operationalize generative AI at scale, shifting from cost-intensive experimentation to sustainable, governed production.”

The new features are available now or expected soon; specifically, the AWS Neuron community operator can be accessed via the Red Hat OpenShift OperatorHub for current customers using OpenShift or its managed service version on AWS. Developer preview support for running Red Hat’s Inference Server on these dedicated chips is anticipated in January 2026.

More information about this partnership can be found at industry events such as AWS re:Invent 2025 booth #839 or through resources like Red Hat in the AWS Marketplace, trial sign-ups for Red Hat’s Inference Server, detailed product pages about Red Hat AI, or overviews explaining the benefits of AI inference.



Related

Harry K. Sideris, President and Chief Executive Officer

Duke Energy Indiana dedicates $320K for winter energy bill aid

Duke Energy Indiana has announced that it will allocate $320,000 in financial assistance to help customers pay their energy bills during the winter season.

Brian T. Moynihan, Chair of the Board and Chief Executive Officer,

Bank of America launches limited-edition FIFA World Cup credit cards with exclusive ticket offer

Bank of America has announced a partnership with Visa to offer new Customized Cash Rewards and Unlimited Cash Rewards Visa card applicants the option to select a limited-edition FIFA World Cup 2026 custom card design.

Reid Wilson Secretary

State Water Infrastructure Authority schedules December meeting in Raleigh with remote access

The State Water Infrastructure Authority will hold its next meeting on December 10, 2025, from 9 a.m. to 3 p.m. The session will take place in person at the Green Square Boardroom in Raleigh and will also be accessible online via Webex or by phone.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from North Wake News.