Voice Engine is impressive voice cloning technology but are we ready for it? Not yet according to its maker OpenAI, as it’s only allowed private testing with a small group of trusted partners.
OpenAI is sharing examples but not public access to its synthetic voice technology.
What makes Voice Engine different to other cloning applications is speed. From a 15 second sample, it can build a believable synthesised version of a real person speaking. All you do is write a phrase and it will convert that text into voice.
OpenAI is testing the waters with Voice Engine and Sora. Both are world changing platforms which must be used responsibly. By giving the world a working preview, OpenAI can take steps to allay fears or make changes.
“We first developed Voice Engine in late 2022, and have used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud,” OpenAI says on its blog.
“At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse.”
We showed how it works during our Sky News Weekend Edition tech segment below.
OpenAI says it’s committed to developing “safe and broadly beneficial AI” and hoping to start a dialogue on the responsible deployment of synthetic voices. One of those responsible uses is providing reading assistance to non-readers and kids through natural-sounding animated voices. Education tech company, Age of Learning, uses Voice Engine and GPT-4 to create real-time, personalised responses to interact with students.
There are other worthy reasons for using voice cloning technology but the bad reasons are very bad. Our voice is often used to verify who we are so in the wrong hands, this type of technology could be dangerous. In fact, OpenAI is encouraging the phasing out of voice -based authentication security.
“We recognise that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” OpenAI says on its blog post.
“We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.
“We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures.”
These are historic times. Let’s hope we look back and say in a real or synthetic voice, “gee they handled that well.”