Microsoft researchers published a paper this week saying: VASA-1is a new AI tool that can generate a convincing video of someone speaking using just a still image. Microsoft has no plans to make this new tool available to the public anytime soon, but it’s pretty impressive. Well, if you don’t look too closely at the teeth, it’s impressive. Let’s take a look at the munchers.
The VASA-1 model works by taking a still photo of a human face. In the example, Published by MicrosoftAI can generate a human face that doesn’t actually exist, and when input with an audio file, it can generate a synchronized video that includes facial nuances and natural-looking movements.
Again, it’s all very impressive, as you can see in one of the videos Microsoft provided below. However, one area where VASA-1 seems to struggle is rendering teeth. Focusing on the teeth can give it a cartoonish quality, making it look slightly animated in a way that doesn’t quite match the surreal quality of everything else.
Slowing down the overall speed, as Gizmodo did in the GIF below, reveals this video’s weird teeth even more. (It can almost feel bad to pick apart someone’s appearance until you remember that the person underneath literally doesn’t exist.)
Another video example provided by Microsoft, shown below, shows a similar cartoon-like quality to the teeth. However, other features look very realistic, especially if you remember that the source material is just still images and audio files.
For some reason, the teeth are slightly less noticeable in videos showing the man. This was probably because the model showed the man not opening his mouth wide when he was speaking. But if you look closely, you can feel that something is wrong here.
One of the more interesting points noted by the researchers is that their model can generate relatively high-quality videos very quickly, which is also favored by other AI generators. OpenAI Sora It is reported that he struggled with In fact, the paper reports that the latency on his single NVIDIA desktop PC was only 0.17 seconds. RTX 4090 GPU.
And it’s fast enough to deliver instant video to a variety of applications, including real-time translation services.
“Our method not only provides high-quality videos with realistic face and head dynamics, but also supports online generation of 512×512 videos at up to 40 FPS with negligible start-up delay. , paves the way for real-time engagement with lifelike avatars that emulate human conversational behavior,” the new paper says.
Researchers are clearly aware of the dangers of this type of technology, which perhaps explains why Microsoft has not yet announced plans to rush it to the public. However, researchers have also identified use cases that may be useful for humanity.
“Benefits such as improving educational equity, increasing accessibility for individuals with communication difficulties, and providing companionship and therapeutic support for those in need make our research and other related pursuits important.” “We are committed to developing AI responsibly, with the goal of advancing human well-being,” the paper says.
“Given this situation, we do not plan to release any online demos, APIs, products, additional implementation details, or related products until we are certain that the technology is used responsibly and in accordance with appropriate regulations. .”
It’s probably a good idea considering number of scams This kind of technology makes it possible. After all, the 2024 U.S. presidential election is just seven months away.And that The global threat of fascism It won’t go away anytime soon. Humanity now feels truly powerless against AI-generated fakes. And big companies like Microsoft should do everything in their power to limit the potential damage before virtually everything on the internet becomes fake.