Red Hat launches llm-d community for scalable generative AI inference

Red Hat launches llm-d community for scalable generative AI inference
Matt Hicks President and Chief Executive Officer — Red Hat
0Comments

Red Hat has announced the launch of llm-d, an open source project aimed at enhancing generative AI inference at scale. The initiative is supported by founding contributors CoreWeave, Google Cloud, IBM Research, and NVIDIA, along with partners such as AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI. University supporters include the University of California, Berkeley and the University of Chicago.

The llm-d project leverages Kubernetes architecture and vLLM-based distributed inference to provide scalable AI inference across hybrid cloud environments. Brian Stevens, Red Hat’s senior vice president and AI CTO, stated that “the launch of the llm-d community…marks a pivotal moment in addressing the need for scalable gen AI inference.”

Ramine Roane from AMD expressed pride in contributing high-performance GPUs to the project. Shannon McFarland from Cisco noted that “llm-d empowers developers to programmatically integrate and scale generative AI inference.” Chen Goldberg from CoreWeave highlighted their commitment to making powerful AI infrastructure more accessible.

Mark Lohmeyer from Google Cloud emphasized efficient AI inference as organizations deploy AI at scale. Jeff Boudier from Hugging Face mentioned support for diverse models available through their platform. Priya Nagpurkar from IBM Research stressed efficiency and scale as key focuses for enterprise AI solutions.

Bill Pearson from Intel described llm-d as a key inflection point for driving AI transformation at scale. Eve Callicoat from Lambda praised llm-d’s ability to make state-of-the-art inference accessible and efficient. Ujval Kapasi from NVIDIA highlighted collaboration to drive innovation in generative AI.

Ion Stoica from the University of California recognized Red Hat’s efforts in building upon vLLM’s success. Junchen Jiang from the University of Chicago expressed excitement about llm-d leveraging LMCache for distributed KV cache optimizations.

The launch reflects Red Hat’s vision for deploying any model on any accelerator across any cloud environment without exorbitant costs. The company aims to establish vLLM as a standard for gen AI inference within hybrid cloud infrastructures.



Related

Vimal Kapur Chairman and Chief Executive Officer

Honeywell announces pricing of Aerospace senior notes offering for planned spin-off

Honeywell has priced a $15.5 billion private offering of senior notes for its planned spin-off of Honeywell Aerospace Inc., supporting upcoming structural changes within both companies. Proceeds will fund distributions and transaction costs related to the separation.

Harry K. Sideris, President and Chief Executive Officer

Duke Energy explains causes of higher winter bills and highlights support options

Customers across Duke Energy’s service areas have seen higher energy bills this winter, prompting the company to clarify the main reasons behind the increase and highlight resources available to help manage costs.

Ashesh Badani, Senior Vice President and Chief Product Officer

Telefónica Brazil moves key IT systems to Red Hat OpenShift for faster scaling

Telefónica Brazil has completed the migration of its core IT production environment from legacy virtualization to Red Hat OpenShift, according to an announcement made by Red Hat at MWC Barcelona.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from North Wake News.