× close
Microsoft researchers say the AI model they developed allows avatars to have realistic conversations with subtle facial expressions.
Microsoft researchers have unveiled a new artificial tool that can create highly realistic human avatars, but gave no timeline for its public availability, citing concerns about promoting deepfake content.
The AI model, known as VASA-1, which stands for “visual emotional skills,” can create animated videos of people talking with synchronized lip movements using just one image and a voice audio clip.
Disinformation researchers are concerned that applications that leverage AI to create “deepfake” photos, videos and audio clips could become widely exploited in a crucial election year.
“We oppose any activity that creates misleading or harmful content about real people,” the authors of the VASA-1 report released this week by Microsoft Research Asia wrote.
“We are committed to developing AI responsibly, with the goal of promoting human well-being,” they said.
“We do not plan to release any online demos, APIs, products, additional implementation details, or related products until we are certain that the technology is used responsibly and in accordance with appropriate regulations.”
Microsoft researchers said the technology can capture a wide range of facial nuances and natural head movements.
“This paves the way for real-time engagement with lifelike avatars that emulate human conversational behavior,” the researchers said in a post.
According to Microsoft, VASA can handle artistic photos, songs, and non-English audio.
Researchers touted the technology’s potential benefits, including providing virtual tutors to students and providing therapeutic support to those in need.
“It is not our intent to create content to mislead or deceive,” they said.
The VASA video still contains “artifacts” indicating it was generated by AI, the post said.
“I think people would be excited to hear that someone is using ProPublica for the first time as a delegate on a Zoom meeting,” said Ben Werdmuller, head of technology at ProPublica.
“How was it? Did anyone notice?” he said on the social network Threads.
In March, OpenAI, the developer of ChatGPT, announced a voice cloning tool called “Voice Engine” that can essentially duplicate someone’s voice based on a 15-second audio sample.
However, the company said it was “taking a cautious and informed approach toward broader release due to the potential for synthetic speech to be exploited.”
Earlier this year, a consultant working for a leading Democratic presidential candidate admitted that he was involved in robocalls impersonating Joe Biden sent to New Hampshire voters in an effort to highlight the dangers of AI. Stated.
The call included what sounded like Biden’s voice urging people not to vote in the state’s primary in January, and was linked to AI-powered deepfake disinformation in the 2024 White House race. This has sparked alarm among experts who fear the area could flood.