It’s great to have AI that can create videos, but what if you want to add sound to your videos as well? Google’s DeepMind team announced that it has developed video-to-audio (V2A) technology that can generate a soundtrack (music, sound effects, voice) from both text prompts and video pixels.
This is news that will have soundtrack composers shaking awkwardly in their seats – all the more so because, as well as working with automatic video generation services, V2A can also be applied to existing footage, such as archival material or silent films.
The text prompt aspect is interesting because not only can you input “positive prompts” that steer the audio in a desired direction, but you can also add “negative prompts” that tell the AI to avoid certain things, meaning you can generate a potentially infinite number of different soundtracks for a single video.
This clip was generated using the prompt, “Drummer on stage at a concert surrounded by flashing lights and a cheering crowd.”
The system can also create audio using only video pixels, so no text prompts are needed if you don’t want to use them.
Google DeepMind acknowledges that V2A currently has some limitations — audio quality currently depends on the quality of the video, and lip syncing when generating speech is not perfect — but says it is conducting further research to resolve these issues.
For more information and further examples, Google DeepMind Website