Microsoft's AI app VASA-1 lets your photos speak and sing with lifelike expressions

Written by Bob Yirka, Tech Explore

Given a single portrait image, a voice audio clip, and an optional set of other control signals, our approach produces high-quality, lifelike talking face videos at resolutions of 512 × 512 and up to 40 FPS. will generate. The method is versatile and robust, and the generated talking faces can faithfully imitate human facial expressions and head movements, reaching a high level of realism and lifelike expression. (All photorealistic portrait images published in this paper are virtual, non-existent identities.) Credit: arXiv (2024). DOI: 10.48550/arxiv.2404.10667

A team of AI researchers at Microsoft Research Asia has developed an AI application that converts still images of people and audio tracks into animations that accurately depict individuals speaking or singing the audio tracks with appropriate facial expressions.

The team published a paper explaining how they created the app. arXiv Preprint server. Video samples are available on the research project page.

The research team aimed to display lifelike facial expressions while creating animations where the still images spoke and sang using the provided backing audio track. They were clearly successful in developing VASA-1. VASA-1 is an AI system that transforms still images, whether captured with a camera, drawn, or painted, into what it describes as “exquisitely synchronized” animation.

The group proved the effectiveness of the system by posting short video clips of their test results. In one piece, a cartoon version of Mona Lisa performs a rap song. In another picture, a picture of a woman is transformed into a singing performance, and in yet another picture of a man giving a speech.

In each animation, facial expressions change according to the words to emphasize what is being said. Researchers also note that although the videos appear to be real, closer inspection can reveal flaws and artificially generated evidence.

Credit: Microsoft

The research team achieved this result by training the app using thousands of images with different facial expressions. They also note that the system currently produces 512 × 512 pixel images running at 45 frames per second. It also took an average of 2 minutes to create a video using his desktop-grade Nvidia RTX 4090 GPU.

The researchers suggest that VASA-1 could be used to generate highly lifelike avatars for games and simulations. At the same time, they acknowledge the potential for abuse and therefore do not make the system available for general use.

For more information:
Sicheng Xu et al, VASA-1: Real-time audio-driven talking faces generated in real-time, arXiv (2024). DOI: 10.48550/arxiv.2404.10667

Project page: www.microsoft.com/en-us/research/project/vasa-1/

Magazine information:
arXiv

What's Hot

AI technology takes marine navigation to a new level

Google’s parent company is still thriving as it shifts to inject more AI technology into search

Zuckerberg opposes China’s blockade of AI technology

Microsoft’s AI app VASA-1 lets your photos speak and sing with lifelike expressions

Microsoft invests $1.5 billion in UAE’s artificial intelligence G42

Why is Windows 11 so frustrating?

The week ahead on Wall Street: Focus on US GDP data, Microsoft, Alphabet, Meta Platforms and Tesla earnings

Former White House cyber policy director says Microsoft is a national security threat • The Register

Microsoft insider sells $89 million in stock, hints at hesitation

If Satya Nadella had invested $10,000 in Microsoft stock when he became CEO, this is how much he would have made today.

How Amazon Prime’s ‘Fallout’ series highlights the power of post-apocalyptic video game IP

Popular household items are on sale at Amazon with up to 77% off

In the Amazon, butterflies play a key role in the fight against climate change

CeraVe Skin Care and Breezy Blouse available on Amazon starting at $7

Why Apple is betting big on India

Security Bite: Cybercriminals take advantage of Apple Store Online third-party pickup

Protecting against iPhone password reset attacks: How-to

Apple just canceled major products, reports say

Our Picks