Artificial Intelligence IBM learned to copy human voice in 5 minutes

IT companies and giant corporations from around the world pay a lot of attention to the natural sound of a computer voice. The next achievement of IBM in this direction was a new algorithm based on the basis of artificial intelligence. Five minutes after reading the interlocutor’s voice, he is able to independently pronounce any text with his voice.

According to IBM, the new AI algorithm is able in real time to build dialogs and adapt to different styles of conversation and voice timbre. The company’s specialists note that thanks to the synthesis of neural speech based on modular architecture, they "managed to create a realistic computer voice."

The system consists of three components: a predictor of the prosody function, a predictor of acoustic characteristics and a neural vocoder. Together, all three components allow you to determine the speaker’s style as accurately as possible, as well as adjust the pitch and energy of the speech, taking into account acoustic distortions. According to the company, only five minutes of listening to the interlocutor are enough to train a neural network.

You can find examples of the sound of the new speech synthesizer on the IBM Watson service website.

