
We already know. OpenAI’s chatbots can do Pass the bar exam. without going to law school. Now, just in time for the Oscars, a new OpenAI app called Sora hopes to master cinema without going to film school. Right now for a research product, Sora is approaching a few select creators and several security experts who will red team it for security threats. OpenAI plans to make it available to all wannabe authors at an unspecified date, but decided to preview it early.

From other companies, such as Giants Google Like for startups The runwayhave already disclosed. Text to Video AI Projects. But OpenAI says Sora is distinguished by its outstanding photorealism — something I haven’t seen in its competitors — and its ability to create short clips that other models typically do for up to a minute. . The researchers I spoke to wouldn’t say how long it takes to render all that video, but when pressed, they chalked it up to “going out for a burrito” rather than “taking a few days off.” More than stated in the ballpark. If the hand-picked examples I’ve seen are to be believed, the effort is worth it.

OpenAI didn’t let me enter my hints, but it did share four instances of Sora’s power. (None hit the required one-minute mark; the longest was 17 seconds.) The first came from a detailed cue that sounded like a frantic screenwriter’s set-up: “The beautiful, snowy city of Tokyo is stirring. The camera pans down a bustling city street, following many people enjoying the beautiful snowy weather and shopping at nearby stalls. Beautiful sakura petals are blowing in the air with snowflakes.

AI-generated video with OpenAI’s Sora.

Thanks to OpenAI

The result is a compelling vision of what Tokyo is all about, in that magical moment when snowflakes and cherry blossoms coexist. A virtual camera, as if attached to a drone, follows the couple as they slowly stroll through the street scene. One of the bystanders is wearing a mask. Cars rumble along the riverside road to their left, and shoppers to the right flit in and out of rows of small shops.

It’s not perfect. Only when you watch the clip a few times do you realize that the main characters – a couple strolling down a snow-covered sidewalk – would have faced a dilemma if the virtual camera had been rolling. The sidewalk they occupy appears to be dead. They had to go up a small rail on a strange parallel walkway to their right. Despite this slight flaw, The Tokyo Example is a mind-blowing exercise in world-building. Down the road, production designers will debate whether this is a powerful companion or a job killer. Also, the people in this video—generated entirely by a digital neural network—are not shown in close-up, and they do not perform any emotional work. But the Sora team says that in other cases they had fake actors showing real emotions.

Other clips are also impressive, particularly a “moving scene of a little fluffy monster kneeling by a red candle” as well as some detailed stage directions (“wide eyes and an open mouth”) and a description of the desired vibe. asking for Clip Sora creates a Pixar-esque creature that contains DNA from Furby, Gremlin and Sully. Monster Inc.: A Movie Kaname. I remember when that later movie came out, Pixar made a big deal about how difficult it was to make. The extremely complex texture of a monster’s fur As the creature moves around. It took all the wizards at Pixar months to get it right. OpenAI’s new text-to-video machine … just did it.

“It learns about 3D geometry and consistency,” says Tim Brooks, a research scientist about the feat. “We didn’t make it up—it just emerged from looking at a lot of data.”

An AI-generated video created with the prompt, “The animated scene features a close-up of a short fluffy monster kneeling next to a melting red candle. The art style is 3d and realistic, with lighting and textures. is focused on. The mood of the painting is one of wonder and curiosity, as the monster stares into the flame with wide eyes and an open mouth. Its pose and expression convey a sense of innocence and playfulness, as if for the first time in its existence. exploring the world around.The use of warm colors and dramatic lighting further enhance the relaxed atmosphere of the image.

Thanks to OpenAI

While the scenes are certainly impressive, Sora’s most surprising abilities are the ones he hasn’t been trained for. Powered by a version of Diffusion model Powered by OpenAI’s Dalle-3 image generator as well as GPT-4’s Transformer-based engine, Sora doesn’t just create videos that meet the requirements of signage, but does so in a way that complements Cinema Grammar. The emergent grasp of .

This translates into a flair for storytelling. In another video that was created with the caption “a beautifully rendered papercraft world of a coral reef, filled with colorful fish and sea creatures”. Bill Peebles, another researcher on the project, notes that Sora created a narrative ensemble through his camera angles and timing. “There are actually multiple shot changes—they’re not stitched together, but generated all at once by the model,” he says. “We didn’t ask him to do it, it just happened automatically.”

“A beautifully rendered papercraft world of a coral reef, filled with colorful fish and sea creatures,” created with AI-generated video prompts.Thanks to OpenAI

In another instance that I haven’t seen, Sora is prompted to visit the zoo. Peebles says, “It started on a big sign with the name Zoo, slowly scaled down, and then made several changes to show the different animals that live in the zoo. The cinematographic approach that It was not expressly instructed to do so.

One feature in Sora that the OpenAI team hasn’t demonstrated, and may not release for some time, is the ability to create videos from a single image or sequence of frames. “It’s going to be another great way to improve storytelling skills,” says Brooks. “You can draw exactly what you have in mind and then bring it to life.” aware that this feature also has the potential to create deep flaws and misinformation.” Peebles adds, “We’re going to be very careful about all the security implications of this.
