Microsoft recently introduced a new artificial intelligence (AI) model that can generate hyper-realistic videos of talking human faces. The AI image-to-video model, called VASA-1, can transform still photos of people’s faces into lifelike animations. According to the company, the videos created will have lip movements synchronized with the audio, making facial expressions and head movements look natural.
Recently, a video demonstrating the features of the app went viral on social media, surprising people. An AI-generated video shows Leonardo da Vinci’s iconic painting Mona Lisa lip-syncing to a painting by Anne Hathaway. ‘Paparazzi’.
“Microsoft has removed VASA-1. This AI can expressively make a single image sing or speak from an audio reference. Similar to Alibaba’s EMO. 10 Wilds Examples: 1. Mona Lisa rapping the paparazzi,” reads the caption of the thread shared by Min Choi.
Watch the video here:
Microsoft has removed VASA-1.
This AI can expressively make a single image sing or speak from an audio reference.Similar to Alibaba’s EMO
10 wild examples:
1. Mona Lisa rapping paparazzi pic.twitter.com/LSGF3mMVnD
— Minchoi (@minchoi) April 18, 2024
The video went viral, with some people finding the funny clip amusing. One user wrote: “I fell on the floor laughing when I saw the Mona Lisa clip,” while another commented: “Oh yeah.” I wish da Vinci could have witnessed this. ”
Others have also raised concerns about unethical use, particularly for the purpose of creating deepfakes.
A third wrote: “Creepy?” attractive? First, the potential for deepfakes has grown exponentially…but it also opens up some interesting creative possibilities. ”
A fourth added: “Deepfake technology has just taken a scary leap forward and is more convincingly deceptive than we ever imagined.”
According to Microsoft, VASA is a framework for generating lifelike conversational faces for virtual characters with engaging visual affective skills (VAS).
VASA-1 not only produces lip movements that are exquisitely synchronized with the audio, but it can also capture a wide range of facial nuances and natural head movements that contribute to the perception of authenticity and lifelike sensations. Masu. Core innovations include generative models of global facial dynamics and head movements that operate in the facial latent space, and the ability to create such an expressive and disentangled facial latent space using video. development,” the company wrote.
“We do not plan to release any online demos, APIs, products, additional implementation details, or related products until we are certain that the technology will be used responsibly and in accordance with appropriate regulations,” Microsoft added.
Click for more trending news