UHG
Search
Close this search box.

Google Introduces IndicGenBench to Benchmark Indic LLMs Across 29 Languages

A benchmark to help in evaluating the generative capabilities of Indic LLMs, IndicGenBench is part of a slew of India-centric updates released during Google I/O Bengaluru 2024.

Share

kogo bhashini ai agents

As part of several initiatives that Google has taken up in India to improve Indic LLM capabilities, Google Pay vice president and GM Ambarish Kenghe announced the launch of IndicGenBench.

A benchmark to help in evaluating the generative capabilities of Indic LLMs, IndicGenBench is part of a slew of updates released during Google I/O Bengaluru 2024. Kenghe said that the benchmark covers as many as 29 languages, including several Indian languages that do not currently have benchmarks.

Speaking to AIM, Google Cloud director of customer engineering and field CTO Subram Natarajan said, “In India, there are two main areas of focus: Addressing language-related issues, while the second involves large-scale transformations across various industries, be it in customer engagement or addressing the broader needs of the Indian population.”

With a focus on improving language-related issues, Kenghe announced the open sourcing of DeepMind’s Composition to Augment Language Models (CALM), allowing developers to combine specialised language models with Google’s Gemma models. Interestingly, research on CALM had been done specifically by the Google DeepMind and Google Research teams in India, with the paper released earlier this year.

“Let’s say you’re building a coding assistant that can converse in English. Now, by composing a Kannada specialist model with CALM, you may be able to offer coding assistance to Kannada users as well,” explained Kenghe.

This focus on Indic language LLMs comes as DeepMind expands Project Vaani, a collaborative effort between Google and the Indian Institute of Science (IISc), wherein over 14,000 hours of speech data in 58 languages, has been made accessible to developers. This data was collected from over 80,000 speakers in 80 districts across the country.

As previously covered by AIM, this is being open-sourced as part of MeitY’s flagship AI initiative, Bhashini. These capabilities are also soon to be expanded as Bhashini also launched an initiative called Bhasha Daan, to help crowdsource voice and text data in multiple Indian languages.

📣 Want to advertise in AIM? Book here

Picture of Donna Eva

Donna Eva

Donna is a technology journalist at AIM, hoping to explore AI and its implications in local communities, as well as its intersections with the space, defence, education and civil sectors.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.