UHG
Search
Close this search box.

Gladio Announces Audio Transcription API built on OpenAI Whisper

Gladio’s Audio transcription API is built on Whisper-Large-v2 of OpenAI and has a WER of 1%

Share

Jean-Louis Queguiner, the founder of Gladio, which works with AI deployment, announced the release of Audio transcription alpha. Built on OpenAI’s Whisper-Large-v2, the speech-to-text API is able to transcribe a 1h file in 10s with a Word Error Rate as low as 1%. It is believed to be more accurate than other products in the market by at least 5 times. The company believes that this would open up the immense scope in the audio intelligence space and broaden future applications in AI with plug-and-play APIs.  

Whisper is a pre-trained model for Audio Speech Recognition (ASR). These models have been trained on 680k hours of data. It was proposed by Alec Radford from OpenAI. The large-v2 model is trained for 2.5 times more epochs for improved efficiency. Whisper generates human-readable transcriptions, which means that the ASR system will be able to output commas, periods, hyphens and other punctuation marks. This will result in high-quality transcriptions resulting in a low Word Error Rate (WER). 

Integrating the latest NLP and deep learning research, the API for alpha is built on neural network optimization, which has resulted in improved inference speed by around 60 times compared to other similar providers in the market. Gladio is currently working on 250 models to create a “holistic intelligence solution” which can perform more than 45 tasks, including translation, summaries, gender detection and sentiment analysis. 

Inference speed is another parameter that is considered. The baseline was established by comparing the inference speed of other STT providers. At 16KHz sampling rate and 16 bits encoding, alpha was able to score 1 hour of Audio in both mono and stereo configuration, and this was compared with the results of other models that can deliver the same task within the same parameters. 

Source: Twitter

The company also believes that “democratizing access” to AI should not only be cost-centric. It should be about simplifying the complexity of the tools used. 

📣 Want to advertise in AIM? Book here

Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.