Microsoft's New AI Can Make Photographs Sing and Talk — and It Already Has the Mona Lisa Lip-Syncing

1 week ago

Microsoft published a investigation insubstantial this week highlighting a caller AI exemplary called VASA-1 that tin toggle shape a azygous image and audio clip of a personification into a realistic video of them lip-syncing — pinch facial expressions, caput movements, and all.

The AI exemplary was trained connected AI-generated images from generators for illustration DALL·E-3, which nan researchers past layered pinch audio clips. The results are images-turned-videos of talking faces.

The researchers built connected exertion from competitors specified arsenic Runway and Nvidia, but state successful nan paper that their method of doing things is higher-quality, much realistic, and "significantly outperforms" existing methods.

Related: Adobe's Firefly Image Generator Was Partially Trained connected AI Images From Midjourney

The researchers said nan exemplary tin return successful audio of immoderate magnitude and make a talking look successful accordance pinch nan clip.

The only image that wasn't AI-generated that nan researchers experimented pinch was nan Mona Lisa. They made nan iconic image lip-sync to Anne Hathaway's "Paparazzi," which starts pinch nan lines "Yo I'm a paparazzi, I don't play nary yahtzee."
A screenshot of nan video mid-frame. Credit: Entrepreneur

The Mona Lisa was 1 illustration of a photograph input that nan AI exemplary was not trained connected — but could manipulate anyway. The exemplary could besides toggle shape creator photos, return successful singing audios, and grip reside successful languages that weren't English.

The researchers emphasized that nan exemplary could activity successful real-time pinch a demo video that showed nan exemplary instantly animating images pinch caput movements and facial expressions.

Deepfakes, aliases digitally altered media of a personification that could dispersed misinformation aliases return someone's likeness without permission, are a consequence posed by precocious AI that tin make integer media pinch comparatively fewer reference points.

Related: Tennessee Passes Law Protecting Musicians From AI Deepfakes

Microsoft addressed that interest mostly successful nan paper, pinch nan researchers stating, "We are opposed to immoderate behaviour to create misleading aliases harmful contents of existent persons, and are willing successful applying our method for advancing forgery detection."

The researchers stated that their method had perchance affirmative applications too, for illustration improving accessibility and enhancing acquisition efforts.

Google demoed a similar investigation project past month, showcasing an AI tin of taking a photograph and creating a video from it that nan personification tin past power pinch their voice. The AI was capable to adhd caput movements, blinks, and manus gestures.