UHG
Search
Close this search box.

NVIDIA and Hugging Face Offers Inference-as-a-Service with 5x Token Efficiency for AI Models

The service enables developers to quickly prototype using open-source AI models available on the Hugging Face Hub and deploy them effectively.

Share

NVIDIA Hugging Face

Open Source platform Hugging Face is offering developers Inference-as-a-Service that will be powered by NVIDIA’s NIM. The new service provides 5x better token efficiency for AI models and allows immediate access to NIM microservices running on NVIDIA DGX Cloud. 

The new inference-as-a-service was announced at the ongoing SIGGRAPH 2024, a premier conference and exhibition on computer graphics and interactive techniques happening at Denver, Colorado. The new service will facilitate developers to deploy powerful LLMs such as Llama 2, Mistral AI models and many more with optimisation from NVIDIA NIM microservices. Hugging Face Enterprise Hub users can access serverless inference for increased flexibility and minimal infrastructure overhead with NVIDIA NIM.

When accessed as a NIM, large models such as the 70-billion-parameter version of Llama 3, will deliver up to 5x higher throughput when compared with off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems.

The new inference service supports Train on DGX Cloud, an AI training service that is already available on Hugging Face.

The Omnipresent NVIDIA NIM 

NVIDIA NIM is a set of AI microservices, including NVIDIA AI foundation models and open-source community models, that has been optimised for inference with standard APIs. It improves token processing efficiency and enhances the NVIDIA DGX Cloud infrastructure, accelerating AI applications. This setup provides faster, more robust results. 

The NVIDIA DGX Cloud platform is tailored for generative AI, offering developers reliable, accelerated computing infrastructure for faster production readiness. It supports AI development from prototyping to production without requiring long-term commitments. 

Hugging Face Dominates

The new announcement banks on an existing partnership between both tech companies and is only going to foster the developer community further. Interestingly, Hugging Face recently announced its profitability with a 220-member team. They also released SmolLM, a series of small language models

📣 Want to advertise in AIM? Book here

Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.