After launching tools for text-to-speech and speech-to-speech synthesis, AI voice startup ElevenLabs is moving to the next target. The two-year-old startup founded by former Google and Palantir employees today announced the launch of a new text-to-sound AI offering called Sound Effects.
Available starting today on the ElevenLabs website, Sound Effects uses the startup’s in-house foundation model and allows creators to generate different types of audio samples by simply typing a description of their imagined sound.
The company first teased the tool in February with a post featuring Sora-generated clips, albeit enhanced with AI sound effects.
ElevenLabs partnered with Shutterstock to bring this product to life and expects to see adoption from creators across domains who are looking to enhance their content with immersive soundscapes.
What to expect from ElevenLabs Sound Effects?
Currently, when creators want to add ambient noises to their content — such as social videos, games, movies and TV shows — the must either manually record them or buy/license audio files from different repositories on the internet.
The approach works, but you may not always find the audio you’re looking for from these sources, or have the budget to pay to record a new sound.
ElevenLabs’ new Sound Effects tool changes that, giving creators and production teams a way to get exactly what they want by simply typing it in plain, conversational English.
When a user enters a text prompt detailing the sound effect they are looking for, the model powering Sound Effects processes it and generates six unique audio samples to choose from.
The user can then listen to each of these and pick what works best for their project by downloading or storing it directly on ElevenLabs’ platform.
VentureBeat got early access to the offering and found it was able to generate clear outputs in about 30-40 seconds. However, in our tests, Sound Effects generated just four options, not six.
This included a range of audio samples, covering standard ambient noises such as thunderstorms, doorbells and coins jingling to more complex ones like monkeys chattering, cars racing, people eating at a diner or a train coming to a halt.
Mati Staniszewski, CEO of ElevenLabs, told VentureBeat the tool can also go beyond a few-second-long sounds to produce longer audio samples such as instrumental music and character voices.
“It can generate instrumental music tracks up to 22 seconds with prompts like guitar loop, jazz saxophone solo, and music techno loop,” Staniszewski explained. “The model can also create a variety of character voices using prompts like ‘woman singing dancing in the sand, we watched the daylight end’ or ‘an ogre saying ‘stay away puny human’. You can even chain together sounds with prompts like ‘A joyful elderly woman says I’m so proud of you and then laughs.’”
While the company has not shared specifics of the model powering these capabilities, it did note that it is based on in-house research of the company and has been fine-tuned on Shutterstock’s audio library of licensed tracks.
“The combined power of our rich and immersive library of tracks and this cutting-edge audio technology has enabled the creation of a true market first. We’re thrilled by the positive feedback from the early access community and look forward to seeing the wide array of projects they will create,” Aimee Egan, Chief Enterprise Officer at Shutterstock, said in a statement.
Goal to power creators worldwide
Since its inception two years ago, ElevenLabs has focused on developing and launching powerful AI audio capabilities.
The company first launched models for text-to-speech in different languages and then followed it up with a voice cloning product and AI Dubbing, a speech-to-speech conversion tool that allowed users to translate audio and video into 29 different languages whilst preserving the original speaker’s voice and emotions.
With the launch of Sound Effects today, it is extending this work, equipping creators with more tools to produce high-quality content.
Staniszewski hopes creators across domains will be able to use Sound Effects, including film and television studios, video game developers, marketers and social media content creators.
However, he did not share the names of the enterprises that have been alpha-testing the product thus far.
Back in January, the company said it counts 41% of the Fortune 500 among its customers, including big names such as The Washington Post, Storytel and TheSoul Publishing.
As the next step, Staniszewski added, the company will also launch a music generation model as well as a voiceover studio offering, which is currently in alpha. The timeline for both remains unclear at this stage.
Other companies in the AI speech, sound and music generation space are Google, Meta, Suno, Pika, MURF.AI, Play.ht and WellSaid Labs. According to Market US, the global market for such tools stood at $1.2 billion in 2022 and is estimated to touch nearly $5 billion in 2032, with a CAGR of slightly above 15.40%.
The post ElevenLabs moves beyond speech with AI-generated Sound Effects appeared first on Venture Beat.