Red Hat has announced the launch of llm-d, an open source project aimed at enhancing generative AI inference at scale. The initiative is supported by founding contributors CoreWeave, Google Cloud, IBM Research, and NVIDIA, along with partners such as AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI. University supporters include the University of California, Berkeley and the University of Chicago.
The llm-d project leverages Kubernetes architecture and vLLM-based distributed inference to provide scalable AI inference across hybrid cloud environments. Brian Stevens, Red Hat’s senior vice president and AI CTO, stated that “the launch of the llm-d community…marks a pivotal moment in addressing the need for scalable gen AI inference.”
Ramine Roane from AMD expressed pride in contributing high-performance GPUs to the project. Shannon McFarland from Cisco noted that “llm-d empowers developers to programmatically integrate and scale generative AI inference.” Chen Goldberg from CoreWeave highlighted their commitment to making powerful AI infrastructure more accessible.
Mark Lohmeyer from Google Cloud emphasized efficient AI inference as organizations deploy AI at scale. Jeff Boudier from Hugging Face mentioned support for diverse models available through their platform. Priya Nagpurkar from IBM Research stressed efficiency and scale as key focuses for enterprise AI solutions.
Bill Pearson from Intel described llm-d as a key inflection point for driving AI transformation at scale. Eve Callicoat from Lambda praised llm-d’s ability to make state-of-the-art inference accessible and efficient. Ujval Kapasi from NVIDIA highlighted collaboration to drive innovation in generative AI.
Ion Stoica from the University of California recognized Red Hat’s efforts in building upon vLLM’s success. Junchen Jiang from the University of Chicago expressed excitement about llm-d leveraging LMCache for distributed KV cache optimizations.
The launch reflects Red Hat’s vision for deploying any model on any accelerator across any cloud environment without exorbitant costs. The company aims to establish vLLM as a standard for gen AI inference within hybrid cloud infrastructures.



