Google dives into generative AI for new audio, video and images

The Google I/O Developer conference offered a slew of new capabilities that are rolling out to users today and in the coming weeks and months. These include updates on how to create video, images, and music using generative AI.

Introduction to Generative AI

The Google team has been busy improving its generative AI. Some of the improvements they’ve made (including things like AI in image search, which is pretty cool) include new video creation capabilities using an agent called Veo, new text-to-image updates for Imagen 3, and some new capabilities in Google Music’s AI Sandbox.

First up is Veo, the new generative AI agent that can help you create 1080p video using text, images, or voice commands. Veo will match the video it can create to the style of a photo you’re using, and can draw on a number of new tools to create the video you’re envisioning. For example, Veo can now understand terms like “time-lapse,” “tracking shot,” or “aerial shots” to better create the frames you want.

“With Veo, we’ve improved techniques for how the model learns to understand what’s in a video, renders high-definition images, simulates the physics of our world, and more. These insights will fuel advances in our AI research and enable us to build even more useful products that help people interact and communicate in new ways,” said Eli Collins, VP, Product Management, and Douglas Eck, Senior Research Director, in Google’s post announcing the new capabilities.