Last updated July 23, 2024
In AI Breakthroughs, AI Origins & Evolution, Intellectual AI Discussion

India’s Beatoven.ai Shows the World How AI Music Generation is Done Right

Within a year, Beatoven.ai amassed more than 100,000 data samples, which were all proprietary for them.

Share

Illustration by Nikhil Kumar

Published on July 22, 2024

by Mohit Pandey

AI music generation is a tricky business. Amidst copyright claims and the need for fairly compensating artists, it becomes an uphill task for AI startups, such as Suno.ai or Udio AI, to gain revenue and popularity.

However, Beatoven.ai, an Indian AI music startup, has gotten the hang of it in the most ethical and responsible way possible.

One of the most important reasons for that is its co-founder and CEO Mansoor Rahimat Khan is a professional sitar player himself and comes from a family of musicians going back seven generations. “I was very fascinated by this field of music tech,” he said.

Khan told AIM that he started his journey at IIT Bombay and realised that though there were not many opportunities in India, he wanted to combine his passion for music and technology.

Beatoven.ai is part of the JioGenNext 2024, Google for Startups Accelerator, and AWS ML Elevate 2023 programs. Khan said that the team applied to many accelerator programs because they realised they needed a lot of compute to fulfil the goal of building an AI music generator.

The company raised $1.3 million in its pre-series A round led by Entrepreneur First and Capital 2B, with a total funding of $2.42 million.

After switching several jobs, Khan met Siddharth Bhardwaj and building on their shared passions for music and tech founded Beatoven.ai in 2021. “After coming back from Georgia Tech, I got involved in the startup ecosystem, and started working with ToneTag, an audio tech startup funded by Amazon,” said Khan.

Everyone Needs Background Music in their Life

The co-founders found out that the biggest market was in the generation of sound tracks for Indie game developers, agencies, and production houses. “But when we look at the nitty gritty of the industry, copyrights are a very scary thing. We thought that generative AI could be a solution to this.” Khan said that the idea was to figure out how users could give simple prompts and generate audio.

The initial idea was to create a simple consumer focused UI where users could select a genre, mood, and duration to generate a soundtrack. But that was when the era of LLM hadn’t started and NLP wasn’t good enough for such tasks. “We started in 2021 before the LLM era, and our venture capital came from Entrepreneur First. We raised a million dollars in 2021 and quickly built our technology from scratch.”

The biggest challenge like every other AI company was the collection of data. “You either partnered with the labels that charged huge licensing fees or scraped [data]. That was the only other option. But if you did that, you would be sued,” said Khan.

All of the Tech

This is where Beatoven.ai takes the edge over other products in the market. Khan and his team started contacting small, and slowly bigger artists for creating partnerships and sourcing their own data. The company had a headstart as no one was talking about this field back then. Within a year, it amassed more than 100,000 data samples, which were all proprietary for them.

During the initial days, Beatoven.ai did not use Transformers. Khan said that it is one of the reasons that the quality was not that great. Later, when Diffusion models came into the picture, the team realised that it is the way forward for AI-based music generation.

The company started by using different models for different purposes, this included the ChatGPT API from OpenAI. The Beatoven.ai platform also uses CLAP (Contrastive Language-Audio Pretraining), which is mostly used for video generation.

Apart from this, the company uses latent diffusion models like Stability AI’s Stable Audio, VAE models, and AudioLLM, for different tasks such as individual instruments within the generated music. Then the company uses an Ensemble model for mixing all these individual audios together.

For inference, the company uses CPUs (instead of GPUs), which keeps it fast and optimised, while reducing costs.

Trained Fairly

Khan admitted that the audio files generated by Suno.ai’s have superior quality right now, but they also use Diffusion models, which makes them a little slow. “The quality is significantly better from where we started, but it’s not quite there yet.” Khan added that currently the speed is high because the company uses different models for different tasks.

To further expand the data, Beatoven.ai started partnering with several outlets such as Rolling Stone and packaged it like a creator fund. In January 2023, it announced a $50,000 fund for Indie music as a part of the Humans of Beatoven.ai program for expanding their catalogue.

This gave Beatoven.ai a lot of popularity and many artists wanted to partner with the team. Khan said that the company aims to do more licensing deals to expand music libraries. “When it comes to Indian labels though, they are not yet open to licensing deals,” said Khan.

Beatoven.ai’s model is certified as Fairly Trained and also certified by AI for Music as an ethically trained AI model.

Apart from music generation, Beatoven.ai is launching Augment, similar to ElevenLabs’s voice generation model. This would allow agencies to connect to Beatoven.ai’s API and train on their own data to make remixes of their own music. For the demo, Khan showed how a simple sitar tune could be turned into a hip-hop remix.

“You can just use your existing content and create new songs. That’s the idea,” he said.

Currently, Beatoven.ai is also testing a video-to-audio model using Google’s Gemini, where users can upload a video and the model would understand the context and generate music based on that. Khan showed a demo to AIM where the model could also be guided using text prompts for better quality audio generation.

Not Everyone is a Musician

Khan envisions that in the near future, companies such as Spotify or YouTube start open sourcing their data and offer APIs to make the AI music industry a little more open.

Meanwhile, while speaking with AIM, Udio’s co-founder Andrew Sanchez said, “It’s enabling for people who are just up and coming, who don’t yet have big professional careers, the resources, time or money to really invest in making a career. “It’s enabling a whole new set of creators.” This would make everyone a musician.

When it comes to Beatoven.ai, he said that he aims to head in a more B2B direction as building a direct consumer app does not make sense. “I don’t believe everybody wants to create music,” added Khan, saying that not everyone is learning music in the world. That is why, the company is currently focused only on background music without vocals.

📣 Want to advertise in AIM? Book here