Just days after a synthetic speech startup company released their new AI voice platform, the tool is already being abused to create deep fake celebrity audio clips.
Speech artificial intelligence startup ElevenLabs launched a beta version a few days ago of its platform that enables users to create entirely new synthetic voices for text-to-speech audio. Already and perhaps unsurprisingly, the internet has done nefarious things with this technology.
The company revealed on Twitter that it’s seen an “increasing number of voice cloning misuse cases,” and it’s thinking of ways to address the issue by “implementing additional safeguards.”
While the company did not elaborate on these “misuse cases,” reports have surfaced of 4chan posts with clips featuring generated voices made to sound like celebrities reading or saying questionable things.
Voice clips have featured violent, racist, homophobic, and transphobic content. It’s as yet unclear if all of these clips have been made using ElevenLabs’ technology, but a post with a collection of the voice files on 4chan included a link to the company’s platform.
Currently, ElevenLabs is gathering feedback on how to prevent users from abusing its technology. Ideas thus far include adding more layers to its account verification to enable voice cloning, such as requiring users to enter their ID or payment information.
Other considerations include having users verify copyright ownership of the voice they want to clone, like the inclusion of a sample with prompted text. The company is even considering dropping its tool for public use altogether and having users submit voice cloning requests to be manually verified.
Although this may be the first time that deep fake audio clips have become such a prevalent issue, advances in AI and machine learning led to a similar incident a few years ago with a rise in deep fake video clips, specifically pornography, using the faces of celebrities over existing pornographic materials.