Microsoft AI creates scary, authentic talkie videos from a single photo

Microsoft Research Asia has unveiled an AI model that can generate horrifyingly realistic deepfake videos from a single still image and audio track. How can we trust what we see and hear online in the future?

As mentioned earlier, artificial intelligence systems have been blowing us away on major benchmarks for the past few years, but already many are leaving us out to pasture prematurely and are very concerned that they will be replaced by algorithms. Concerned.

We have recently witnessed the transformation of fairly limited smart gadgets into powerful everyday assistants and essential productivity tools. Additionally, there are models that allow you to generate realistic sound effects on silent video clips or create stunning footage from text prompts. Microsoft’s VASA-1 framework seems like another big step forward.

After training the model on footage of nearly 6,000 real-world talking faces from the VoxCeleb2 dataset, the technology not only allows newly animated subjects to lip-sync accurately to the provided voice audio track; , can produce terrifying real-life videos. You can also recreate a variety of facial expressions and natural head movements, all from a single still headshot.

It’s very similar to Alibaba’s Intelligent Computer Laboratory’s Audio2Video diffusion model that came out a few months ago, but even more photorealistic and accurate. VASA-1 can reportedly produce synchronous video at 512×512 pixels and 40 frames per second with “negligible start-up delay.”

The VASA-1 AI model produces terrifyingly realistic videos that can not only lip-sync to a provided voice audio track, but also include facial expressions and natural head movements, all from a single static headshot. can.

Microsoft Research Asia

All of the reference photos used to demonstrate the project were AI-generated by StyleGAN2 or DALL-E, but some notable examples were used to show off the framework’s ability to step outside of the training set. There is one. It’s rap. Mona Lisa!

The project page has many examples of talking and singing videos generated from still images and matched with audio tracks, but the tool also includes features such as facial expressions such as emotion, facial expressions, and distance from a virtual video camera. There are also optional controls for setting dynamics and head pose. and the direction of the gaze. Something powerful.

“The emergence of AI-generated talking faces provides a window into a future where technology amplifies the richness of human-human and human-AI interactions,” the preface to the paper detailing the results reads. It’s dark. “Such technologies have the potential to enrich digital communication, increase accessibility for people with communication disorders, transform teaching methods with interactive AI tutoring, and provide therapeutic support and social interaction in healthcare. It’s hidden.”

While all very commendable, the researchers also acknowledge the potential for abuse. Reading tons of online news on a daily basis already feels like an impossible task to weed out fact from outright fabrication, but it’s hard to believe that most people can make it seem like they’re saying whatever they want. Imagine having tools at your disposal that can.

Whether it’s playing a harmless prank on a relative over a FaceTime from your favorite Hollywood actor or pop star, implicating an innocent person in a serious crime by posting an online confession, or trying to trick someone into making money by impersonating your precious grandchild. There is a possibility of deception. For example, if you are facing an issue or a major politician has expressed support for a controversial topic. Realistic and convincing.

However, the content produced by the VASA-1 model “contains discernible artifacts,” and the researchers believe that “until we can ensure that the technology is used responsibly and in accordance with appropriate regulations. ” There are no plans to make the platform publicly available.

A paper detailing the project has been published on the arXiv server.

Source: Microsoft Research

Source link

What's Hot

AI technology takes marine navigation to a new level

Google’s parent company is still thriving as it shifts to inject more AI technology into search

Zuckerberg opposes China’s blockade of AI technology

Microsoft AI creates scary, authentic talkie videos from a single photo

Microsoft invests $1.5 billion in UAE’s artificial intelligence G42

Why is Windows 11 so frustrating?

The week ahead on Wall Street: Focus on US GDP data, Microsoft, Alphabet, Meta Platforms and Tesla earnings

Former White House cyber policy director says Microsoft is a national security threat • The Register

Microsoft insider sells $89 million in stock, hints at hesitation

If Satya Nadella had invested $10,000 in Microsoft stock when he became CEO, this is how much he would have made today.

How Amazon Prime’s ‘Fallout’ series highlights the power of post-apocalyptic video game IP

Popular household items are on sale at Amazon with up to 77% off

In the Amazon, butterflies play a key role in the fight against climate change

CeraVe Skin Care and Breezy Blouse available on Amazon starting at $7

Why Apple is betting big on India

Security Bite: Cybercriminals take advantage of Apple Store Online third-party pickup

Protecting against iPhone password reset attacks: How-to

Apple just canceled major products, reports say

Our Picks

AI technology takes marine navigation to a new level

Google’s parent company is still thriving as it shifts to inject more AI technology into search

Zuckerberg opposes China’s blockade of AI technology

Subscribe to Updates

What's Hot

Microsoft AI creates scary, authentic talkie videos from a single photo

Related Posts

Subscribe to Updates