Google is announcing a major new family of generative AI models that it calls Gemini Omni. The first Omni Model, Omni Flash, can generate AI videos using an assortment of different inputs: text, photos, videos, and audio. Down the line, though, Google envisions Omni as something that can “create anything from any input,” according to a blog post — hence the Omni name.
The company is positioning Omni Flash as a video version of something like its Nano Banana image generation model, which people have already used to generate more than 50 billion images since its introduction last year. For example, you’ll be able to ask Omni Flash to insert a likeness of you into videos, which doesn’t sound like anything I’d ever want to do. But Nicole Brichtova, who leads the product team that works on Omni, tells The Verge that Google has seen a lot of people insert their likeness into images with Nano Banana.
With Gemini Omni Flash, you’ll be able to generate clips with video and audio that are up to 10 seconds long, Dumitru Erhan, senior research director at Google DeepMind, tells The Verge. The company is working on making that longer.
Google already has a video generation model called Veo, but that’s a text-to-video generation model — Omni Flash, on the other hand, can use a video as the basis to help make another video. Omni Flash also has “a lot” more world knowledge than Veo because of Gemini’s training data, according to Koray Kavukcuoglu, CTO of Google DeepMind and chief AI architect at Google.
Gemini Omni Flash will be available starting Tuesday in the Gemini app, Google Flow, and YouTube Shorts.
Source link