Microsoft's AI tools can turn photos into realistic videos of people talking and singing

Microsoft Research Asia has announced a new experimental AI tool called VASA-1. This tool can take a still image of a person (or a drawing of a person) and an existing audio file and create a realistic speaking face from it in real time. It has the ability to generate facial expressions and head movements from existing still images, as well as generate appropriate lip movements to accompany speech and songs. The researchers have uploaded a bunch of examples to the project page, and the results appear good enough to trick people into thinking they’re real.

Although the sample’s lip and head movements may still look a bit robotic and unsynchronized upon closer inspection, this technology can be exploited to easily and quickly create deepfake videos of real people. It is clear that it can be created. Researchers themselves are aware of this possibility, and until they are confident that their technology will be used “responsibly and in accordance with appropriate regulations,” they will not be able to provide “online demos, APIs, products, additional implementation details, or related We have decided not to release this product. Rules. However, he did not say whether he plans to introduce specific safeguards to prevent malicious actors from using it for illicit purposes, such as creating deepfake pornography or misinformation campaigns.

The researchers believe their technology has many benefits, despite the potential for misuse. They say this can not only be used to increase equity in education, but also to improve accessibility for people with communication problems. Perhaps by providing access to an avatar that can communicate on their behalf. It could also provide companionship and therapeutic support to those who need it, they say, hinting at the possibility that VASA-1 could be used in programs that give people access to AI characters they can talk to.

According to a paper published with the announcement, VASA-1 was trained on the VoxCeleb2 dataset, which contains “more than 1 million utterances from 6,112 celebrities” extracted from YouTube videos. . Although this tool was trained on real faces, it also works on artistic photos like the Mona Lisa. Interestingly, researchers combined Anne Hathaway’s viral recreation of a Lil Wayne song with an audio file. paparazzi. It’s a lot of fun and well worth a look, even if you’re wondering what you can do with technology like this.

This article contains affiliate links. If you click on such links and make a purchase, we may earn a commission.

Source link

What's Hot

AI technology takes marine navigation to a new level

Google’s parent company is still thriving as it shifts to inject more AI technology into search

Zuckerberg opposes China’s blockade of AI technology

Microsoft’s AI tools can turn photos into realistic videos of people talking and singing

Microsoft invests $1.5 billion in UAE’s artificial intelligence G42

Why is Windows 11 so frustrating?

The week ahead on Wall Street: Focus on US GDP data, Microsoft, Alphabet, Meta Platforms and Tesla earnings

Former White House cyber policy director says Microsoft is a national security threat • The Register

Microsoft insider sells $89 million in stock, hints at hesitation

If Satya Nadella had invested $10,000 in Microsoft stock when he became CEO, this is how much he would have made today.

How Amazon Prime’s ‘Fallout’ series highlights the power of post-apocalyptic video game IP

Popular household items are on sale at Amazon with up to 77% off

In the Amazon, butterflies play a key role in the fight against climate change

CeraVe Skin Care and Breezy Blouse available on Amazon starting at $7

Why Apple is betting big on India

Security Bite: Cybercriminals take advantage of Apple Store Online third-party pickup

Protecting against iPhone password reset attacks: How-to

Apple just canceled major products, reports say

Our Picks