Google’s Lumiere AI brings video closer to real than unreal


Google’s Lumiere AI brings video closer to real than unreal -Gudstory

Getting your Trinity Audio player ready...
Rate this post


Google’s new video generation AI model Lumiere uses a new propagation model called Space-Time-U-Net, or STUNet, which figures out where things are in the video (space) and how they move and change together. are(time). Ars Technica The report says this method lets Lumiere create video in one process instead of stringing together small still frames.

Let’s start by creating a base frame from the Lumiere prompt. Then, it uses the STUnet framework to begin predicting where objects within that frame will move to create more frames that flow into one another, giving the appearance of seamless motion. The Lumiere also produces 80 frames compared to the 25 frames of the Stable Video Diffusion.

Admittedly, I’m more of a text reporter than a video person, but the sizzle reel published by Google along with a pre-print scientific paper shows that AI video generation and editing tools have gone from the uncanny valley to almost realistic in just a few years. . It also positions Google’s technology in a space already occupied by competitors like Runway, Stable Video Diffusion, or Meta’s Emu. Runway, one of the first mass-market text-to-video platforms, released Runway Gen-2 in March last year and has started offering more realistic-looking videos. Runway videos also have difficulty portraying motion.

Google was kind enough to put the clip and prompt on the Lumiere site, which allowed me to put the same prompt through the runway for comparison. Here are the results:

Yes, there is a touch of artificiality in some of the clips presented, especially if you look closely at skin textures or if the scene is more atmospheric. But look at that turtle! It actually moves in water like a turtle! It looks like a real turtle! I sent the Lumiere introduction video to a friend who is a professional video editor. While she pointed out that “you can obviously tell it’s not completely real,” she thought it was impressive that if I hadn’t told her it was AI, she would have thought it was CGI. (She also said: “This is going to cost me my job, isn’t it?”)

Other models stitch together video from generated key frames where movement has already occurred (think of pictures in a flip book), while STUNet lets Lumiere focus on movement, based on Where the generated content should be at any given time in the video.

Google hasn’t been a big player in the text-to-video category, but it has gradually released more advanced AI models and leaned toward a more multimodal focus. Its Gemini large language model will eventually bring image creation to Bard. Lumiere is not yet available for testing, but it shows Google’s ability to develop an AI video platform that is on par with commonly available AI video generators like Runway and Pika – and arguably a little better. And just a reminder, this was where Google was with AI video two years ago.

Google Imagen clip from 2022
Image: Google

In addition to text-to-video generation, Lumiere will also allow image-to-video generation, stylized generation, which lets users create videos in a specific style, cinemagraphs that animate only a portion of the video, And allows inpainting to hide an area. To change the color or pattern of the video.

However, Google’s Lumiere paper states that “there is a risk of abuse with our technology to create fake or harmful content, and we believe that biases and malicious use cases must be addressed to ensure safe and fair “It is important to develop and implement tools to detect.” Usage.” The authors of the paper did not explain how this could be achieved.


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *