Open AI, the maker of Chat GPT, took the digital world by storm by revealing its sophisticated AI model, Sora, that creates stunning videos from written prompts. The company is currently making Sora available only to select group of people, such as red teamers, visual artists, policymakers and filmmakers in order to obtain critical feedback and assess its potential risks and harms.

Sam Altman, the CEO of Open AI posted a message on X, asking his followers to reply with text prompt that they would like to see as a video. Within a few minutes the responses poured in, and he showcased what Sora could accomplish with the videos that were created by the AI model. From a half duck flying on a dragon through the sunset to two golden retrievers podcasting on top of a mountain, Sora created mind-blowing realistic videos for the exact prompts.

What is more fascinating about Sora is that it can generate quality videos even for complex scenes with multiple characters with things that exist in the real world. At first glance, it’s certainly hard to distinguish whether it’s a real video or AI-generated.

Safety Concerns      

You may now wonder if the realistic aspect of this AI model may raise safety concerns, such as hateful content and misinformation. Open AI claims that they are working with experts to critically analyze its risks before making Sora available to the public. Apparently, the company is building a tool to detect misleading content, and they plan to include C2PA metadata.

In addition to identifying misleading content, the AI model will have a text classifier that checks and rejects prompts that violates usage policies. For example, prompts such as sexual content, celebrity likeness, violent scenes, or any hateful imagery will be rejected by the text classifier. Open AI assures that AI will be released only after collecting feedback from policymakers, educators, and artists.

Technology Used In Sora

Sora is a diffusion model that uses a transformer architecture similar to GPT models. The images and videos are presented as collections of small units of data known as patches. The diffusion transformers are trained on a wide range of visual data and produce videos in different resolutions, durations, and aspect ratios.

The capabilities of Sora extend beyond creating videos from text prompts. The AI model can take an existing image, animate the contents in the image with accuracy, and generate a video. But that’s not all; Sora can extend an existing video or fill in missing elements and frames.

Netizens expressed excitement, fear, and mixed feelings upon hearing the news of Sora. A user wrote on Sam Altman’s feed expressing anxiety, “Sam, please don’t make me homeless.” Another user said, “The future is wild”. How do you feel about his new AI tech, cool or threatened? Let us know your thoughts in the comments!

