Microsoft Corp. has published a research paper introducing a new kind of artificial intelligence framework that lets you upload still photos, add audio samples, and create hyper-realistic talking heads that look and sound like the real thing. .
The new framework, called VASA-1, takes a single portrait-style image and audio file and combines them to create short videos of realistic facial expressions, head movements, and even talking heads. You can create one. Ability to sing songs using uploaded audio.
Microsoft said VASA-1 is just a research project at this point and is not making it available to others, but it has posted a number of demo videos with stunning realism.
Nvidia Corp. and Runway AI Inc. have both released similar technology, but VASA-1 appears to be able to reduce mouth artifacts and create more realistic talking heads.
The company says the new framework is specifically designed to animate virtual characters, so all the people in the sample are composited and created using OpenAI’s DALL-E image generation model. is being generated. However, if it’s possible to animate AI images, it should be just as easy to animate photos of real people, so it’s clear that it has the potential to go even further.
In the demo, the talking heads look like real people photographed and have smooth, natural movements. The lip sync feature is particularly good, and any unnatural movements are very hard to notice.
Equally impressive is that VASA-1 does not appear to require a traditional front-facing passport or portrait-style image to function. Examples include shots of heads facing slightly different directions. The model also offers advanced control using gaze direction, head distance, emotional expression, etc. as inputs to enhance realism.
Big possibilities and big risks
In terms of practical applications, one of the most obvious use cases is video games. VASA-1 allows developers to create more realistic AI-generated characters with highly natural lip-syncing movements and facial expressions, increasing immersion. The technology can also be used to create avatars in social media videos, and perhaps even go further to create more realistic AI-generated actors, actresses, or singers that look like they’re actually talking or singing. Movies and music videos can also be realized.
In addition to its ability to perfectly lip-sync talking heads with uploaded songs, VASA-1 can also process non-human images, such as the Mona Lisa rapping the words of a paparazzi.
Microsoft has removed VASA-1.
This AI can make a single image sing or speak expressively from an audio reference.Similar to Alibaba’s EMO
10 wild examples:
1. Mona Lisa rapping paparazzi pic.twitter.com/LSGF3mMVnD
— Minchoi (@minchoi) April 18, 2024
However, just as there is potential for creativity, there is definitely also potential for misuse of this technology. VASA-1 will definitely make life much easier for anyone investing in creating deepfake videos. For example, someone can upload a photo of Donald Trump’s face, followed by a short audio clip of his voice, creating a realistic video of him saying whatever he wants.
The reason Microsoft is so wary of the project is the risk of exploitation. “Our research focuses on generating visual emotional skills for virtual AI avatars and aims for positive applications,” Microsoft researchers said. “It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it can still be exploited to impersonate humans. there is.”
As such, the company said it has no plans to release online demos, products or additional implementation details at this time, adding that it would only consider doing so if it was certain the technology would be used responsibly.
Image: Microsoft
Your upvote is important to us and helps us keep our content free.
Your one click below will support our mission of providing free, deep and relevant content.
Join our community on YouTube
A community of over 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies Founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many other celebrities and experts. Please join us.
thank you