AI needs a record of just three seconds to copy the voice, its timbre and emotional coloring.
VALL-E is based on EnCoder technology. The AI breaks the entry into its individual components and generates a new one based on what it already “knows” about the sample.
The VALL-E training took place on the LibriLight library containing 60,000 hours of English speech from 7,000 people.
Microsoft did not dare to publish the VALL-E source code – the technology can be used for evil purposes. The company also said that future projects, if they carry a potential threat of abuse, will not be made public.