UHG
Search
Close this search box.

NVIDIA’s New Model ChatQA-2 Rivals GPT-4 in Long Context and RAG Tasks

The Llama3-ChatQA-2-70B model can process contexts up to 128,000 tokens, matching GPT-4-Turbo's capacity.

Share

Researchers at NVIDIA have developed Llama3-ChatQA-2-70B, a new large language model that rivals GPT-4-Turbo in handling long contexts up to 128,000 tokens and excels in retrieval-augmented generation (RAG) tasks. 

The model, based on Meta’s Llama3, demonstrates competitive performance across various benchmarks, including long-context understanding, medium-length tasks, and short-context evaluations.

Read the full paper here

The Llama3-ChatQA-2-70B model boasts several key highlights, including its ability to process contexts up to 128,000 tokens, matching the capacity of GPT-4-Turbo. It demonstrates superior performance in RAG tasks compared to GPT-4-Turbo and delivers competitive results on long-context benchmarks extending beyond 100,000 tokens. 

Additionally, the model performs strongly on medium-length tasks within 32,000 tokens and maintains effectiveness on short-context tasks within 4,000 tokens.

The researchers employed a two-step approach to extend Llama3-70B’s context window from 8,000 to 128,000 tokens. This involved continued pre-training on a mix of SlimPajama data with upsampled long sequences, followed by a three-stage instruction tuning process.

Evaluation results show that Llama3-ChatQA-2-70B outperforms many existing state-of-the-art models, including GPT-4-Turbo-2024-04-09, on the InfiniteBench long-context tasks. The model achieved an average score of 34.11, compared to GPT-4-Turbo’s 33.16.

For medium-length tasks within 32,000 tokens, Llama3-ChatQA-2-70B scored 47.37, surpassing some competitors but falling short of GPT-4-Turbo’s 51.93. On short-context tasks, the model achieved an average score of 54.81, outperforming GPT-4-Turbo and Qwen2-72B-Instruct.

The study also compared RAG and long-context solutions, finding that RAG outperforms full long-context solutions for tasks beyond 100,000 tokens. This suggests that even state-of-the-art long-context models may struggle to effectively understand and reason over such extensive inputs.

This development represents a significant step forward in open-source language models, bringing them closer to the capabilities of proprietary models like GPT-4. The researchers have provided detailed technical recipes and evaluation benchmarks, contributing to the reproducibility and advancement of long-context language models in the open-source community.

📣 Want to advertise in AIM? Book here

Picture of Gopika Raj

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.