Sora 2 by OpenAI: The AI Video App Everyone’s Talking About

Sora 2 by OpenAI: The AI Video App Everyone’s Talking About

Sora 2 from OpenAI

2025 has undeniably been an exceptional year in AI. We’ve seen tools come up for every conceivable thing, (more of them than we need, or even want) in every field. AI video generation has been the avenue with some of the most sensational releases, and possibly the most sensational has been Sora. It’s become ubiquitous across internet discourse in all kinds of communities. I’d been personally very excited for this, after Google’s Veo 3 seriously whetted my appetite.

OpenAI rolled out the Sora app to all employees last week, and it’s now available on iOS. After signing up, users receive access invites, which can be shared with a limited number of friends. OpenAI’s idea is to “make sure you come in with your friends,” believing that content feels more genuine when created within real friend groups.

What it is: The Sora Story So Far

Sora first surfaced on 9th September 2024, and almost a year later, Sora 2 has been unveiled on 30th September. This edition has been released as a website and as a standalone app, also called Sora. It crossed 164,000 downloads within two days and surpassed ChatGPT to become the #1 app on the App store. So far it’s only available in the US and Canada, but OpenAI expresses intent to bring an international release as soon as possible.

The Sora app has two fundamental kickers – it’s massively better than its predecessor at the mechanics and quality of video generation, and it comes with cameos, a feature that scans your face and voice in an initial capture while sign-up, which then allows you and your friends to “cast” you in any videos they create.  The video creation process is simply typing out a prompt and waiting.  

The Sora app takes a scan of the user's face upon sign-up. (Image Source: Casey Neistat on YouTube)
The Sora app takes a scan of the user’s face upon sign-up. (Image Source: Casey Neistat on YouTube)

The official release document (which can be found here) expresses a vision of Sora as a close-knit social app that reinforces a sense of community, “at a time where all major platforms are moving away from the social graph.”

The app follows a markedly TikTok style, creation-consumption focused layout, with the emphasis being towards creation. It centralizes cameos as the main driving force behind the flow of content, and the rest of the app is built around video creation and sharing.

In the present state, the app has no monetization features, but OpenAI has stated the possibility to allow users to pay to generate more videos than the standard limits. A few days post release, the model also seems to have been “nerfed”, both in number of generations and in the quality of them. Users believe it’s been done to ensure faster creation of videos and to reduce compute stress on their GPUs, as per a discussion on the OpenAI subreddit.

There’s a clear consideration of the impact of Sora’s features, with outlined concerns about doomscrolling, isolation and addiction in the release document. It states that the user will have complete control over who can publish drafts featuring their cameos, the feed content is said to be geared heavily towards people that you follow or interact with, and the app also does not optimize for time spent in feed. 

The OpenAI team’s also gone into extensive detail on their feed philosophy, and going through it gave me a sincere glimmer of hope for the future. The implementation of these ideals remains to be seen, but I’m thankful that they’re headed on the right path.

Overall, OpenAI’s aiming to have their own Threads-like moment. A new social media app with a genuine twist. For now, this seems to be a novelty OpenAI can deliver best.

What Concerns Does It Raise (Besides the AI Slop)?

To put it simply, Sora 2 is very, very good at generating videos and it has been built around incorporating the users in the videos it generates. Despite the Meta hiring spree earlier this year, the demo videos extravagantly prove that there still remain geniuses at OpenAI. The demos and the documents clearly portray much better comprehensions of real-world physics, and the introduction of cameos banks heavily on solid and consistent character reproduction across multiple generations. There are also significantly upgraded realism abilities to Sora 2.

As Casey Neistat stressed in his latest YouTube video, it’s abstracted a massive process involving skill, equipment, expertise, the need of distribution or even other people – generating the next viral video is as simple as typing in the right prompt . Prompting is an action already lazy enough, and this pushes user abilities to a new extent. Videos up to a minute long can be created, and they are generated with audio that’s accurate and in perfect sync.

The videos also have moving watermarks like TikTok, to ensure easy detection if the videos aren’t obviously identifiable as AI. Besides these, companies are also scrambling to enfore stronger copyright enforcement measures, like copyright detection APIs, along with watermark verification systems. When looking at videos that people generated and posted online, it’s easy to see why this is critical. Out of context, a few of the generations were good enough to have gotten past me unnoticed, had I not been consciously analyzing them as AI content. This is where the can of worms opens.

Most obviously, there’s the entire copyright infringement/persona theft side of things. We’ve seen celebrities speak up against AI generations using their likenesses freely as the user pleases. Scarlett Johansson made news a moment ago as OpenAI used her voice for ChatGPT 4.0’s voice model, following which they removed “Sky” from the list of options in Voice mode, without stating a reason for its removal.

Of late, social media platforms have been flooded with all kinds of jarring, outrageous videos with all kinds of banal things. For instance, ArsTechnica featured a compilation of such instances, one including Michael Jackson performing a standup comedy routine in a kitchen.

Many of these videos feature celebrities, and dead or living, all celebrities are now readily available to be cast into videos for whatever twisted ideas you can think of typing into the prompt bar. Understandably, this isn’t sitting well with present day actors or with the families of deceased celebrities. And the weird contradiction is, despite these real celebrities being accessible, copyrighted fictional characters are still restricted – although OpenAI has teased that fictional characters may be on the horizon.

Coming back to a point Casey Neistat made, it’s a powerful amount of ability that’s being made available to everyone. Simultaneously, it’s also frightening to think of the abstraction impact this can have on the content generation industry.

“How long do we have till we can just describe the YouTube video we want to see?”

Casey Neistat, Popular American YouTuber

“How many years away are we from Netflix? Where we can simply type in ‘Jason Bourne moon landing’ and watch that?”

It’s a powerful argument. The pace of development in AI video generation is rapid – in the span of a few years, we’ve gone from laughably bad to entertaining to finally the outputs being somewhat usable. I don’t think that the future Casey talks of is very far away, and he vividly expressed the fears of “AI slop overwhelming and displacing genuine creativity”. After all, future advancements will only increase the amount (and quality) of content being created, published and consumed.

What Does The Future Hold?

The cycle that we’re witnessing right now by itself isn’t new, but every arena that has such a revolutionary release goes through it, almost as a rule. Having seen the demo videos in the release document, I can’t bring myself to say that this is just another iteration of an authorship crisis. This is truly highly impressive.

It happened when ChatGPT initially came out and we had copywriters losing their jobs left and right, and authors publicly resenting the creations of such LLMs. There were similar fears around the time the first good image generation models came out, and recently as we saw some excellent outputs and capabilities courtesy of Google’s Nano Banana. Photographers, editors and graphic design professionals were left reeling, when demos showed us people achieving results with simple prompts that would’ve taken these professionals with years of experience significantly more time to achieve, and the unrest is understandable.

For a slightly depressing and scary view into just how good image generation has gotten, head over to communities on Reddit like r/ChatGPT and r/StableDiffusion. The capabilities of such models are impressive yet limited, and to a great extent, paywalled, which stops a lot of the population from using them altogether. Besides being limited and itemized in their capabilities, these models and their products are not yet fully indistinguishable from authentic human content, and that’s what gives human creators an edge. For now.

To answer the initial question– what does the future hold? Uncertainty, anticipation and all the highly specific, morbidly entertaining videos you can think of to watch in the meantime. Stay tuned.

Share this in your network
retro
Written by
Dushyant
Leave a comment