The world will never be the same after OpenAI revealed its new video creation model – Sora. Sora makes realistic videos – up to a minute long – from a short text prompt and the results are absolutely stunning. Check out the segment above to see what I mean. Sora can generate complex scenes with many characters, different types of motion plus fine details of the subjects and background. Sora not only understands the prompt, it understands how things exist in reality.
Why is it historic?
See for yourself in the batch of clips released by OpenAI. Just watch and you’ll realise their significance. The videos are incredible and mark the start of a new branch of AI where moving pictures will eventually become photo-real. In fact, although not perfect, this initial batch of Sora videos are good enough to fool many people. I know this because driving and flying simulators often trick punters on social media. And Sora is far better.
This type of Artificial Intelligence requires a lot of computing power to render a realistic video. Sora is a diffusion model. It generates a video by starting off with one that appears to be static noise then eventually transforms it by removing that noise over several (many) steps.
It’s also very complicated.. as it should be given its capabilities. If you want to dig deeper, OpenAI has a fascinating research page on its website.
Why Reveal it Now?
OpenAI has chosen to reveal Sora now to test the waters. It’s part excitement, part dread. OpenAI wants feedback from outsiders while giving the public an idea of what Sora can do.
It’s not available to the public yet, because extensive safety checks need to be done first. Sora will be accessible by red teamers (testers) to help assess harmful aspects or risks. Visual artists, designers, and filmmakers will also be asked to give feedback on how to improve Sora for creative professionals.
On the OpenAI website it says, “The text classifier will check and reject text input prompts that are in violation of usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others.”
Given how realistic Sora videos are already, the technology will not doubt be abused by some people. OpenAI is building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.
They also developed robust image classifiers that are used to review the frames of every video generated to help ensure it sticks to usage policies, before it’s shown to the user.
So yes .. it’s exciting and scary at the same time. Because even if Open AI does the right thing – others may not. This fundamentally changes everything we believe on screen and while not perfect – it’s very already very good.
To combat this uncertainty OpenAI also plans to incorporate C2PA metadata to ensure a clear chain of transformation.
So mark February 2024 down as the month when the world changed forever.