Last updated July 4, 2024
In AI News

French AI Lab Kyutai Releases OpenAI GPT-4o Killer ‘Moshi’

Built on the Helium 7B model, Moshi integrates text and audio training, optimised for CUDA, Metal, and CPU backends with support for 4-bit and 8-bit quantization.

Share

Published on July 4, 2024

by Siddharth Jindal

Kyutai, a French non-profit AI research laboratory, has introduced Moshi, a real-time native multimodal foundational AI model. This open-source project features voice-enabled AI assistant offering capabilities that rival OpenAI’s GPT-4o and Google Astra.

Moshi, developed by a team of just eight researchers in six months, can understand and express 70 different emotions and styles, speak with various accents, and handle two audio streams simultaneously, allowing it to listen and talk at the same time.

Built on the Helium 7B model, Moshi integrates text and audio training, optimised for CUDA, Metal, and CPU backends with support for 4-bit and 8-bit quantization.

Key features of Moshi include:

Real-time interaction with end-to-end latency of 200 milliseconds
Ability to run on consumer-grade hardware, including MacBooks
Support for multiple backends (CUDA, Metal, CPU)
Watermarking to detect AI-generated audio (in progress)

Did Open Science just beat @OpenAI? 🤯@kyutai_labs just released Moshi, a real-time native multimodal foundation model that can listen and speak, similar to what OpenAI demoed GPT-4o in May. 👀

Moshi:
> Expresses and understands emotions, e.g. speak with “french access”
>… pic.twitter.com/PFIcUp2zzD
— Philipp Schmid (@_philschmid) July 3, 2024

Kyutai chief Patrick Pérez said that the Moshi has the potential to revolutionize human-machine communication, saying, “Moshi thinks while it talks”.

Kyutai plans to release the full model, including the inference codebase, the 7B model, the audio codec, and the optimised stack.

Founded in November 2023 with €300 million in backing from investors including French billionaire Xavier Niel, the startup aims to contribute to open research in AI and foster ecosystem development.

The lab’s approach challenges major AI companies like OpenAI, which have faced criticism for delaying releases due to safety concerns. Notably, OpenAI has been withholding the release of its video generation model Sora, as well as the Voice Engine and voice mode features of GPT-4o.

Moshi contributes to France’s increasing influence in the AI sector, alongside other French-origin projects such as Hugging Face and Mistral.

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.