The conversation about AI video usually starts with Sora. It should start with Amit Jain. When OpenAI's text-to-video model was generating headlines in February 2024, Jain and his team at Luma AI were already shipping. Dream Machine landed in June that year - not as a demo, not as a waitlist, but as a product. Within weeks, it had millions of users. Within months, it had 25 million registered accounts.

That instinct - ship it, fix it, ship more - traces back to four years at Apple. Jain joined in 2017 as a Systems and Machine Learning Engineer, and landed in one of the most technically demanding corners of the company: the team figuring out how to put spatial sensing into consumer hardware. He led the engineering work that integrated the first LiDAR scanner into the iPhone 12 Pro in 2020. Then came Passthrough - the feature that lets Apple Vision Pro blend digital objects into the physical world in real-time. Both required solving the same core problem: how do you make a machine understand space?

"Creative work has never lacked ambition - it's lacked execution capacity."
Amit Jain, CEO of Luma AI

That question followed him out of Apple. In September 2021, he co-founded Luma AI with Alex Yu - a UC Berkeley PhD student who walked away from his dissertation to build neural rendering systems - and Alberto Taiuti, an AR/VR engineer who'd previously worked alongside Jain at Apple, then at drone company Skydio. The three shared a conviction: 3D understanding was the missing layer in AI, and whoever built it first would rewrite how humans create.

The early Luma product was a 3D capture app. Point your iPhone at an object, and Luma's NeRF-based (Neural Radiance Field) engine would reconstruct it as a photorealistic 3D model. It was impressive in a demo-room sort of way - but Jain was never building a demo company.

The pivot to video came with a clear thesis: video is the richest training signal for world models. Not text, not images - video. Footage encodes physics, causality, motion, light, gravity. A model trained on petabytes of video isn't just learning pixels; it's learning how the world actually behaves. Dream Machine, Ray 3, and the Unified Intelligence architecture that followed were all steps toward that goal.

Ray 3, launched in September 2025, is the product that shifted the conversation. Described as the world's first reasoning video model, it generates native 16-bit HDR footage - a technical specification that matters because it means the output is cinematically graded, not just aesthetically filtered. A sub-release, Ray 3.14 (yes, the mathematical constant), added native 1080p generation at four times the speed and one-third the cost of its predecessor.

Then came the funding. In November 2025, Luma closed a $900 million Series C led by HUMAIN, a fund backed by Saudi Arabia's Public Investment Fund. The round valued the company at $4 billion and came bundled with something unusual: a partnership to build Project Halo, a 2-gigawatt AI compute cluster in Saudi Arabia. That's not a data center - that's infrastructure at the scale of a power grid. For context, most hyperscalers measure their campuses in megawatts. Project Halo is measured in gigawatts.

"Intelligence shouldn't be fragmented by modality."
Amit Jain, on the Unified Intelligence architecture

The March 2026 launch of Luma Agents - built on the Uni-1 model, the first in the Unified Intelligence family - marked the sharpest statement yet of what Jain is actually building. Uni-1 doesn't just generate video. It processes text, images, video, audio, and spatial data through a single architecture, reasoning across all of them simultaneously. The pitch is straightforward: fragmented AI systems that specialize by modality are a workaround, not a solution. A system that genuinely understands the world has to process it the way the world actually exists - all at once.

Jain has been unusually direct about the implications for entertainment. "Hollywood is already dead if it continues on its current path," he said in a 2025 interview with Lowpass. "It stops by people trying out new ideas, and AI is the only way." That's a provocation designed to land - but it's backed by product decisions. Ray 3 Modify, launched in December 2025, is built specifically for the kind of hybrid human-AI filmmaking workflow that turns that claim into a production tool rather than a talking point.

Missouri Valley College, where Jain studied Mathematics and Computer Science, isn't a school that shows up on many Silicon Valley origin stories. It's a small liberal arts college in Marshall, Missouri - enrollment under 1,500. That path to a $4 billion company, through Apple's spatial computing labs and a lean founding team of three, is worth holding onto as a data point the next time someone insists pedigree is the prerequisite for frontier AI work.

What Jain has built - and is still building - is a bet on a specific theory of intelligence: that understanding comes from synthesis, not specialization. That a machine which can see, hear, and reason at once is not a gimmick but a fundamentally different kind of system. The next few years will test whether that theory is right. So far, 25 million people and $1 billion in funding suggest it's at least worth taking seriously.