I cloned my voice in seconds using a free AI app, and we really need to talk about speech synthesis

Click here to visit Original posting

That voice you hear – even one you recognize – might not be real, and you may have no way of knowing. Voice synthesis is not a new phenomenon, but a growing number of freely available apps are putting this powerful voice-cloning capability in the hands of ordinary people, and the ramifications could be far reaching and unstoppable.

A recent Consumer Reports study that looked at half a dozen such tools puts the risks in stark relief. Platforms like ElevenLabs, Speechify, Resemble AI, and others use powerful speech synthesis models to analyze and recreate voices, and sometimes with little-to-no safeguards in place. Some try – Descript, for example, asks for recorded voice consent before the system will recreate a voice signature. But others are not so careful.

I found an app called PlayKit from Play.ht that will let you clone a voice for free for three days and then charges you $5.99 a week. The paywall is in theory something of a barrier against potential misuse – except that I was able to clone a voice without starting the trial.

Say, 'Too easy'

Voice cloning

(Image credit: Shutterstock)

The app whisks you through setup and then presents some pre-made voice clones, including ones for President Donald Trump and Elon Musk (yes, you can make the President say things like, 'I think DEI should be supported and expanded around the world"). But at the top is a 'Clone a voice' option.

All I had to do was select a video from my photos library and upload it. Videos must be at least 30 seconds long (but not longer than a minute) and in English. I could have chosen one with anyone in it and, if I had, say, filmed a clip of a George Clooney interview, I could have uploaded that (more on that later).

The system quickly analyzed the audio. The app doesn't tell you if this is being done locally or in the cloud, but I'll assume the latter, since such powerful models rarely work locally on a mobile device (see ChatGPT in Apple Intelligence). I saved my voice clone with my name so that I could select it again from the list of cloned voices.

When I want my clone to say something in my voice, I simply type in the text and hit a big Generate button. That process usually takes 10 to 15 seconds.

The voices PlayKit generates, including mine, are eerily accurate. If I have one criticism, it's that the tone and emotion are a bit off. Cloned me sounds the same whether it's talking about what to pick up for dinner or saying it's been in a terrible car crash. Even exclamation points do not change the expression.

And yet, I could see people being fooled by this. Remember, anyone with access to 30 seconds of video of you speaking could effectively clone your voice and then use it as they wish. Sure, they'd have to eventually pay $5.99 a week to keep using it, but if someone is planning a financial scam, they might think it's worth it.

Platforms like this that do not require explicit permission for voice cloning are sure to proliferate, and my concern is that there are no safeguards or regulations in sight. Services like Descript, which require audio consent from the clone target, are outliers.

Voice cloning

(Image credit: Shutterstock)

Play.ht claims that it protects people's voice rights. Here's an excerpt from its Ethical AI page:

Our platform values intellectual property rights and personal ownership. Users are permitted to clone only their own voices or those for which they have explicit permission. This strict policy is designed to prevent any potential copyright infringement and uphold a high standard of respect and responsibility.

It's a high-minded promise, but the reality is that I started recording 30-second clips of famous movie monologues by Benedict Cumberbatch and Al Pacino, and in less than a minute, had usable voice clones for both actors.

PlayKit

Using PlayKit (Image credit: Future)

What's needed here is global AI regulation, but that needs agreement and cooperation at the government level, and right now that's not forthcoming. In 2023, then-President Joe Biden signed an Executive Order on AI that sought in part to offer some regulatory guidance (he followed up with another AI related order early this year). The Trump administration is allergic to government regulation (and any Biden executive order) and quickly revoked it. The problem is that it has yet to propose anything to replace it. It seems the new plan is to hope that AI companies will be good digital citizens, and at least try to do no harm.

Unfortunately, most of these companies are like weapons manufacturers. They're not harming people directly – no one who makes a voice cloner is calling your aging uncle and convincing him with your voice clone that he urgently needs to wire you of thousands of dollars – but some people who are using their AI weapons are.

There's no easy solution for what I fear will become a voice-cloning crisis, but I would suggest that you no longer outright trust the voices you hear in videos, on the phone, or in voice messages. If you're in any doubt, contact the relevant person directly.

In the meantime, I hope that more voice platforms insist on voice and / or documented permission before they allow users to clone anyone's voice.

You might also like