The discussion details the decision to launch Sora as a standalone social application rather than integrating it into the 'single-player' ChatGPT. This strategy was informed by internal prototypes that revealed the power of multiplayer, remix-style creation, leading to a product focused on collaborative and viral content.
The team compares the evolution of their video models to the GPT series, framing Sora 1 as a 'GPT-1 moment' and Sora 2 as a leap comparable to GPT-3.5. This advancement is marked by improved physical realism (e.g., shattering glass, gymnastics) and a massive reduction in inference cost, making the technology more accessible.
A significant future milestone for video models is seen as the ability to perform long-duration simulations of complex physical and biological processes. Researchers at OpenAI are optimistic that these models will become essential tools for scientific discovery, predicting the first major breakthroughs enabled by this technology by early 2028.
Despite Sora's consumer success, the team is intentionally small (around 40 people), making an API strategy crucial for exploring the model's full potential. By releasing the Sora API, OpenAI enables external developers and companies like Mattel to build novel applications, from toy prototyping to CAD visualization.
Sora is designed to lower the barrier to video creation, enabling users to participate in trends and express ideas without technical video editing skills. Features like 'Character Cameos' are central to this, allowing users to animate themselves, friends, or even inanimate objects, fostering a new type of user-generated content.
Keep pulling the thread on Bill Peebles, Rohan Sahai & Thomas Dimson.