ChatGPT maker OpenAI is continuously advancing its artificial intelligence models in order to shine in this competitive landscape. On Friday, the company announced a new AI model that can replicate synthetic audio output from a person’s voice. This model can generate speeches with a single 15-second audio sample.
OpenAI announced this technology on X by sharing their blog post and writing,
We’re sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. https://t.co/yLsfGaVtrZ
— OpenAI (@OpenAI) March 29, 2024
While this is a game-changing AI model, it poses serious risks. The model has not yet been released to the public, and OpenAI did not fail to address the possibility of potential misuse.
It stated in the blog post,
We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year.
We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,
Notably, they are checking out various options that prevent deep-fake misuse of this AI model. Either the company could include watermarks in synthetic voices or introduce protective measures to avert risks.
Also Read – Chinese Smartphone Maker Xiaomi Launches Its First Electric Car
This newly announced voice engine can clone one’s voice quite easily. All a person has to do is record a 15-second clip of their voice, and the AI model will generate synthetic but natural-sounding speech and speak aloud the text prompt typed in by the user. It is interesting to note that the text does not even need to be in the native language of the user, as the voice engine helps change languages.
Meanwhile, OpenAI product manager Jeff Harris, in an interview with TechCrunch, said,
So far, it isn’t open sourced — we have it internally for now. We’re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.