UHG
Search
Close this search box.

NVIDIA Rides High on InfiniBands

“The vast majority of the dedicated large scale AI factories standardise on InfiniBand,” said Jensen Huang during NVIDIA’s Q3 earnings call

Share

Illustration by Nikhil Kumar

NVIDIA has been shining all along with the latest Q3 earnings reflecting the unstoppable growth of the tech giant. The latest earnings reported a revenue of $18.12 billion which was a 206% increase YoY and 34% from the previous quarter. The company even attributed the phenomenal growth in revenue to its continued ramp of NVIDIA HGX platform along with end-to-end networking via InfiniBand

NVIDIA has called out the contribution of networking that has now exceeded $10 billion annualised revenue run rate, nearly tripling from the previous year. This is attributed to the rising demand for InfiniBand which witnessed a fivefold increase YoY.

A Complete Architecture

InfiniBand, which is considered critical for gaining the scale and performance needed for training LLMs, when combined with NVIDIA HGX forms the foundational architecture for AI supercomputers and data centre infrastructures. InfiniBand is commonly used in supercomputing environments for interconnecting servers. The biggest advantage is its ability to provide low latency and high-bandwidth communication that is crucial for parallel processing tasks. With extreme-size datasets and ultra-fast processing of high-resolution simulations, NVIDIA’s Quantum InfiniBand Switches are said to match these needs with lower cost and complexity. 

A few months ago, NVIDIA had reached breakthrough performance with their leading H100 chip. The tests were run on 3,584 H100 GPUs that were connected with InfiniBand as they allowed GPUs to deliver performance at standalone and scale levels. Thereby, proving its prowess when combined with high performing networking capabilities. 

InfiniBands : The Preferred Choice

Speaking about the future of InfiniBands, Jensen Huang said that the vast majority of the dedicated large scale AI factories standardise on InfiniBand, and it’s not only because of data rate and latency but “the way traffic moves around the network” is important. He also called it a ‘computing fabric.” 

Comparing it to Ethernet, Huang talks about the huge difference between the two. With NVIDIA investing $2 billion in infrastructure for AI factories, any form of variance, such as 20 or 30% in overall effectiveness will result in millions of dollars of change in value which accumulate as significant costs over the next 4-5 years. 

Huang calls InfiniBand’s value proposition ‘undeniable for AI factories.’ However, Ethernet is not ruled out. While Infinibands are used for cases that require high bandwidths with low latency, ethernet finds applicability in other scenarios. 

Ethernet, a widely used general-purpose networking technology for wired local area networks (LAN), is suitable for a broad range of applications, more geared towards connecting terminal devices. However, its capabilities cannot be matched with InfiniBands. 

Interestingly, NVIDIA also offers gateway appliances connecting InfiniBand data centres to Ethernet-based infrastructures and storage. NVIDIA will also release Spectrum-X in Q1 next year, an Ethernet offering that is said to achieve 1.6x higher networking performance when compared to other available Ethernet technologies. 

In terms of functionality, Intel’s Omni Path Architecture (OPA) was designed for high-speed data transfer and low latency communication in HPC environments. It was released in 2016, however, it was discontinued in 2019. Cisco on the other hand, has ethernet-based switches but nothing in the HPC space. 

An Integrated Expansion

With GPU and networking offerings, enterprises are now given the choice of integrating their whole architectural framework from NVIDIA products. In addition to speaking about NVIDIA’s partnerships with Reliance, Infosys and Tata, the company mentioned their collaborations with organisations for optimising InfiniBands in their AI compute needs.  

In the earnings call, NVIDIA spoke about its partnership with Scaleway, a French private cloud provider that will build their regional AI cloud based on NVIDIA H100 InfiniBand and AI Enterprise Software to power AI advancements across Europe. 

Furthermore, Julich, a German supercomputing centre, also announced its plans to build their next-gen AI supercomputer using close to 24,000 Grace Hopper Superchips and Quantum-2 InfiniBand, elevating it to world’s most powerful AI supercomputer with over 90 exaflops of AI performance. 

Interestingly, Microsoft Azure uses over 29,000 miles of InfiniBand cabling. Infiniband enabled HB and N-series’ virtual machines are utilised by Microsoft for achieving HPC with cost efficiency. 

Bundling networking and GPU, NVIDIA is boosting its growth and stance in the supercomputer market. Going by the lack of alternatives to NVIDIA Infinibands, it looks like the company’s dominance is going to be further enhanced, ultimately making it indispensable for companies looking to utilise GPU and networking. 

📣 Want to advertise in AIM? Book here

Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.