Microsoft researchers have unveiled a new artificial tool that can create highly realistic human avatars, but gave no timeline for its public availability, citing concerns that it could encourage deepfake content.
The AI model, known as VASA-1, which stands for “visual emotional skills,” can create animated videos of people talking with synchronized lip movements using just a single image and a voice audio clip. .
Disinformation researchers are concerned that applications that leverage AI to create “deepfake” photos, videos and audio clips could become widely exploited in a critical election year.
“We oppose any activity that creates misleading or harmful content about real people,” the authors of the VASA-1 report released this week by Microsoft Research Asia wrote.
“We are committed to developing AI responsibly, with the goal of promoting human well-being,” they said.
“We do not plan to release any online demos, APIs, products, additional implementation details, or related products until we are certain that the technology is used responsibly and in accordance with appropriate regulations.”
Microsoft researchers said the technology can capture a wide range of facial nuances and natural head movements.
“This paves the way for real-time engagement with lifelike avatars that emulate human conversational behavior,” the researchers said in a post.
According to Microsoft, VASA can handle artistic photos, songs, and non-English audio.
Researchers touted the technology’s potential benefits, including providing virtual tutors to students and providing therapeutic support to those in need.
“It is not our intent to create content to mislead or deceive,” they said.
The VASA video still contains “artifacts” indicating it was generated by AI, the post said.
“I would be thrilled to hear about someone using ProPublica for the first time as a delegate in a Zoom meeting,” said Ben Werdmuller, head of technology at ProPublica.
“How was it? Did anyone notice?” he said on the social network Threads.
In March, OpenAI, the developer of ChatGPT, announced a voice cloning tool called “Voice Engine” that can essentially duplicate someone’s voice based on a 15-second audio sample.
But the company said it was “taking a cautious and informed approach toward broader release due to the potential for synthetic speech to be exploited.”
Earlier this year, a consultant working for a leading Democratic presidential candidate admitted he was behind a robocall impersonating Joe Biden sent to voters in New Hampshire, warning of the dangers of AI. He said he was trying to emphasize.
The call included what sounded like Biden’s voice urging people not to vote in the state’s primary in January, and was linked to AI-powered deepfake disinformation in the 2024 White House race. This has sparked alarm among experts who fear the area could flood.
GC/SST