Microsoft's new VASA-1 AI framework generates hyper-realistic talking heads that can also sing

Microsoft Corp. has published a research paper introducing a new kind of artificial intelligence framework that lets you upload still photos, add audio samples, and create hyper-realistic talking heads that look and sound like the real thing. .

The new framework, called VASA-1, takes a single portrait-style image and audio file and combines them to create short videos of realistic facial expressions, head movements, and even talking heads. You can create one. Ability to sing songs using uploaded audio.

Microsoft said VASA-1 is just a research project at this point and is not making it available to others, but it has posted a number of demo videos with stunning realism.

Nvidia Corp. and Runway AI Inc. have both released similar technology, but VASA-1 appears to be able to reduce mouth artifacts and create more realistic talking heads.

The company says the new framework is specifically designed to animate virtual characters, so all the people in the sample are composited and created using OpenAI’s DALL-E image generation model. is being generated. However, if it’s possible to animate AI images, it should be just as easy to animate photos of real people, so it’s clear that it has the potential to go even further.

In the demo, the talking heads look like real people photographed and have smooth, natural movements. The lip sync feature is particularly good, and any unnatural movements are very hard to notice.

Equally impressive is that VASA-1 does not appear to require a traditional front-facing passport or portrait-style image to function. Examples include shots of heads facing slightly different directions. The model also offers advanced control using gaze direction, head distance, emotional expression, etc. as inputs to enhance realism.

Big possibilities and big risks

In terms of practical applications, one of the most obvious use cases is video games. VASA-1 allows developers to create more realistic AI-generated characters with highly natural lip-syncing movements and facial expressions, increasing immersion. The technology can also be used to create avatars in social media videos, and perhaps even go further to create more realistic AI-generated actors, actresses, or singers that look like they’re actually talking or singing. Movies and music videos can also be realized.

In addition to its ability to perfectly lip-sync talking heads with uploaded songs, VASA-1 can also process non-human images, such as the Mona Lisa rapping the words of a paparazzi.

Microsoft has removed VASA-1.

This AI can make a single image sing or speak expressively from an audio reference.Similar to Alibaba’s EMO

10 wild examples:

1. Mona Lisa rapping paparazzi pic.twitter.com/LSGF3mMVnD

— Minchoi (@minchoi) April 18, 2024

However, just as there is potential for creativity, there is definitely also potential for misuse of this technology. VASA-1 will definitely make life much easier for anyone investing in creating deepfake videos. For example, someone can upload a photo of Donald Trump’s face, followed by a short audio clip of his voice, creating a realistic video of him saying whatever he wants.

The reason Microsoft is so wary of the project is the risk of exploitation. “Our research focuses on generating visual emotional skills for virtual AI avatars and aims for positive applications,” Microsoft researchers said. “It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it can still be exploited to impersonate humans. there is.”

As such, the company said it has no plans to release online demos, products or additional implementation details at this time, adding that it would only consider doing so if it was certain the technology would be used responsibly.

Image: Microsoft

Your upvote is important to us and helps us keep our content free.

Your one click below will support our mission of providing free, deep and relevant content.

Join our community on YouTube

A community of over 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies Founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many other celebrities and experts. Please join us.

“TheCUBE is an important partner for the industry. You all really participate in our events. We really appreciate you coming, and we think you value the content you create as well. – Andy Jassy

thank you

Source link

What's Hot

AI technology takes marine navigation to a new level

Google’s parent company is still thriving as it shifts to inject more AI technology into search

Zuckerberg opposes China’s blockade of AI technology

Microsoft’s new VASA-1 AI framework generates hyper-realistic talking heads that can also sing

Microsoft invests $1.5 billion in UAE’s artificial intelligence G42

Why is Windows 11 so frustrating?

The week ahead on Wall Street: Focus on US GDP data, Microsoft, Alphabet, Meta Platforms and Tesla earnings

Former White House cyber policy director says Microsoft is a national security threat • The Register

Microsoft insider sells $89 million in stock, hints at hesitation

If Satya Nadella had invested $10,000 in Microsoft stock when he became CEO, this is how much he would have made today.

How Amazon Prime’s ‘Fallout’ series highlights the power of post-apocalyptic video game IP

Popular household items are on sale at Amazon with up to 77% off

In the Amazon, butterflies play a key role in the fight against climate change

CeraVe Skin Care and Breezy Blouse available on Amazon starting at $7

Why Apple is betting big on India

Security Bite: Cybercriminals take advantage of Apple Store Online third-party pickup

Protecting against iPhone password reset attacks: How-to

Apple just canceled major products, reports say

Our Picks