Indian IT giant Tech Mahindra is working on an indigenous Large Language Model (LLM) that would have the ability to speak in many Indic languages, most notably Hindi.
Called Project Indus, the model will have the ability to speak in 40 different Indic languages, to begin with. More languages that have originated in the country will also be added subsequently.
Tech Mahindra head CP Gurnani recently took to Twitter to request speakers of these languages to contribute to the project with their expressions, vocabulary, and conversations.
Building an LLM needs a big dataset, and the scarcity of Indic language datasets is a challenge. The approach taken by the IT giant is similar to that of Bhashini, a project launched by Narendra Modi to build datasets on Indic languages.
Speakers of languages such as Dongri (Jammu & Kashmir), Kinnauri, Kangri, Chambeli, Garhwali, (Himachal), Kumaoni, Jaunsari ( Uttar Pradesh), Bhojpuri, Maithili, and Magahi ( Bihar), among others can contribute to the project.
Previously, Gurnani, responding to a Sam Altman tweet, confirmed that Tech Mahindra is building an LLM specifically for India.