UHG
Search
Close this search box.

NVIDIA Rolls Out HelpSteer2 Dataset to Align LLMs

"High-quality preference data is crucial for aligning AI systems with human values, but existing datasets are often proprietary or of inconsistent quality," said Zhilin Wang, senior research scientist at NVIDIA.

Share

‘Someday Every Single Car will Have Autonomous Capabilities,' says Jensen Huang

NVIDIA has released HelpSteer2, an open-source dataset designed to train state-of-the-art reward models for aligning LLMs with human preferences. The permissively licensed dataset under CC-BY-4.0 contains 10,681 prompt-response pairs annotated across five attributes on a Likert scale by over 1,000 US-based annotators.

Read the full paper here. 

The HelpSteer2 dataset achieves a state-of-the-art 92.0% accuracy on RewardBench’s primary dataset when used to train a reward model with NVIDIA’s 340B Nemotron-4 base model, outperforming all other open and proprietary models as of June 12, 2024. 

It is highly data-efficient, requiring only 10,000 response pairs compared to the millions used in other preference datasets, thus significantly reducing computational costs. 

It enables the training of reward models that can effectively align large language models like Llama 3 70B to match or exceed the performance of models such as Llama 3 70B Instruct and GPT-4 on major alignment metrics. Additionally, it introduces SteerLM 2.0, a novel model alignment approach that leverages multi-attribute reward predictions to train LLMs on complex, multi-requirement instructions.

“High-quality preference data is crucial for aligning AI systems with human values, but existing datasets are often proprietary or of inconsistent quality,” said Zhilin Wang, senior research scientist at NVIDIA. “

HelpSteer2 provides an open, permissively licensed alternative for both commercial and academic use.

The HelpSteer2 dataset is available on the Hugging Face hub, and the code is open-sourced on NVIDIA’s NeMo-Aligner GitHub repository. 

Sentiment Analysis Datasets

HelpSteer2 trains and guides models to behave in ways that people prefer. Additionally, there are many other sentiment analysis models with applications in various fields, helping enterprises accurately understand and learn from their clients or customers. 

Some examples include Amazon product data, the multi-domain sentiment dataset, and Sentiment140.

📣 Want to advertise in AIM? Book here

Picture of Gopika Raj

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.