Microsoft Research Asia has announced a new experimental AI tool called VASA-1. This tool can take a still image of a person (or a drawing of a person) and an existing audio file and create a realistic speaking face from it in real time. It has the ability to generate facial expressions and head movements from existing still images, as well as generate appropriate lip movements to accompany speech and songs. The researchers have uploaded a bunch of examples to the project page, and the results appear good enough to trick people into thinking they’re real.
Although the sample’s lip and head movements may still look a bit robotic and unsynchronized upon closer inspection, this technology can be exploited to easily and quickly create deepfake videos of real people. It is clear that it can be created. Researchers themselves are aware of this possibility, and until they are confident that their technology will be used “responsibly and in accordance with appropriate regulations,” they will not be able to provide “online demos, APIs, products, additional implementation details, or related We have decided not to release this product. Rules. However, he did not say whether he plans to introduce specific safeguards to prevent malicious actors from using it for illicit purposes, such as creating deepfake pornography or misinformation campaigns.
The researchers believe their technology has many benefits, despite the potential for misuse. They say this can not only be used to increase equity in education, but also to improve accessibility for people with communication problems. Perhaps by providing access to an avatar that can communicate on their behalf. It could also provide companionship and therapeutic support to those who need it, they say, hinting at the possibility that VASA-1 could be used in programs that give people access to AI characters they can talk to.
According to a paper published with the announcement, VASA-1 was trained on the VoxCeleb2 dataset, which contains “more than 1 million utterances from 6,112 celebrities” extracted from YouTube videos. . Although this tool was trained on real faces, it also works on artistic photos like the Mona Lisa. Interestingly, researchers combined Anne Hathaway’s viral recreation of a Lil Wayne song with an audio file. paparazzi. It’s a lot of fun and well worth a look, even if you’re wondering what you can do with technology like this.
This article contains affiliate links. If you click on such links and make a purchase, we may earn a commission.