AI-as-a-service provider Assembly AI has a new speech recognition model called Universal-1. Trained on more than 12.5 million hours of multilingual audio data, the company says it does well with speech-to-text accuracy across English, Spanish, French and German. It boasts that Universal-1 can reduce hallucinations by 30% on speech data and by 90% on ambient noise compared to OpenAI’s Whisper Large-v3 model.
In a blog post, the company describes Universal-1 as “another milestone in our mission to provide accurate, faithful and robust speech-to-text capabilities for multiple languages, helping our customers and developers worldwide build various Speech AI applications.” Along with a better understanding of four major languages, the model can code-switch, transcribing multiple languages within a single audio file.
Universal-1 also supports improved timestamp estimation, which is important when working with audio and video editing and conversation analytics. Assembly AI claims the new model is 13 percent better than its predecessor, Conformer-2. As a result, there’s better speaker diarization, improved concatenated minimum-permutation word error rate (cpWER) of 14%, and speaker count estimation accuracy by 71%.
Finally, parallel inference has been made more efficient, reducing the turnaround processing time for long audio files. Universal-1 is said to accomplish this task five times faster than Whisper Large-v3. Assembly AI compared Universal-1’s processing speed with Whisper Large-3 on Nvidia Tesla T4 machines with 16GB of VRAM. With a batch size of 64, the former took 21 seconds to transcribe 1 hour of audio. However, using a much smaller batch size of 24, the latter took 107 seconds to accomplish the same task.
The benefits of having improved speech-to-text AI models are that notetakers can generate more accurate and hallucination-free notes, identify action items and sort out metadata such as proper nouns, who’s speaking and timing information. Additionally, it’ll help creator tool applications incorporating AI-powered video editing workflows, telehealth platforms automated clinical note entry and claims submission processes where accuracy is important, and more.
The Universal-1 model is available through Assembly AI’s API.
The post Assembly AI claims its new Universal-1 model has 30% fewer hallucinations than Whisper appeared first on Venture Beat.