The Man Who Made Data the New Moat
There is a sentence Alex Ratner has heard more times than he can count: "We have the model. We just don't have the data." He built a billion-dollar company around the fact that everyone keeps saying it and almost nobody knows what to do next.
Ratner's advisor at Stanford, Christopher Re, described it to him in 2015 as an "afternoon project." Build a tool, Re suggested, that lets researchers label training data without hand-annotating every single example. The two of them stood at a whiteboard. The math got complicated fast. The afternoon project took four and a half years - and became Snorkel, the open-source library that quietly rewired how the AI field thinks about data.
The key insight was almost perversely simple: instead of asking humans to label thousands of examples one by one, what if you could write rules - heuristics, knowledge bases, distant supervision - and let those rules do the labeling at scale? The labeled data would be noisy. Ratner's framework would clean it up statistically. The resulting model would still be good. Often, it would be very good.
The vast majority of data has no labels - or, at least, no useful labels for your application.
- Alex RatnerTo test the idea, two researchers used Snorkel to label 20,000 documents in a single day. The same task, done by hand, would have taken more than ten weeks. That was the number that mattered. Not the theory. Not the whiteboard math. The ten weeks turned into one day.
Before Stanford, Ratner studied physics at Harvard - the kind of education that teaches you to strip a problem down to its governing equations. He graduated in 2011, went into consulting, and found himself writing scripts to dig through patent databases. "I was fascinated," he has said, "by all this human knowledge locked inside unstructured text." The fascination outlasted the consulting job. He went back to graduate school and found Christopher Re's lab, where that fascination found its application.
The Snorkel paper, published at VLDB in 2018, earned a "Best Of" designation. Google deployed it internally under the name Snorkel DryBell. Apple, Intel, and U.S. government agencies followed. The research was working. The question was whether it could become a company.
The Dad Jokes Clause
When Ratner's second child was born, his team at Snorkel AI formalized something that had been happening informally. They granted him two dad jokes per day. Not one. Not unlimited. Two - a negotiated ceiling that reflects something real about how the company runs: with warmth, and with precision about things that matter.
Snorkel AI was incorporated in March 2019, the same year Ratner completed his PhD. He took the role of CEO - an unusual choice for an academic who had spent five years with his head in statistical learning theory. He describes his evolution into the role with characteristic directness: "You can have extremely kind empathetic, friendly people who are also very hard-charging and type A." He was building toward a culture where both are true at once.
The timing was not obvious. In 2019, the dominant framing of AI progress was all about model architectures. Transformers were ascendant. BERT had just landed. The field was in a kind of model-centric rapture. Ratner was making a bet in the opposite direction: that the real bottleneck was not the model. It was the data. And that enterprises, in particular, would run headlong into that bottleneck as they tried to deploy AI on their own proprietary domains.
The buck has shifted from model development to data labeling and development.
- Alex RatnerHe was right, and the market moved toward him. In 2021, Snorkel AI raised $85 million at a $1 billion valuation - unicorn status, three years in. The round was co-led by Addition and BlackRock. The company had built a platform that enterprises actually used to deploy AI at scale, not just experiment with it.
In May 2025, Snorkel AI closed a $100 million Series D at a $1.3 billion valuation, with investors including Prosperity 7 Ventures, Greylock, Lightspeed, BNY, and QBE Ventures. Alongside the round, Ratner launched two new products: Snorkel Evaluate, for measuring AI model performance, and Snorkel Expert Data-as-a-Service, which pairs domain experts with Snorkel's programmatic platform to produce high-quality datasets for frontier LLM developers. The company was reporting $148 million in annual revenue.
The context matters: agentic AI had become the obsession of 2025. Every enterprise wanted AI agents that could take actions, not just answer questions. Ratner's read was precise: "We are seeing a surge of momentum around agentic AI, but specialized enterprise agents aren't ready for production in most settings." The gap between demo and deployment was, again, the data. Snorkel was positioned exactly at that gap.
Alongside running the company, Ratner holds an appointment as Affiliate Assistant Professor at the University of Washington's Paul G. Allen School of Computer Science and Engineering. He corrects people who overstate the title - "I'm not the professor yet" - with the kind of precision you'd expect from someone who spent five years in a research lab where words mean exactly what they say.
He got into programming as a child, drawn to it for two reasons he still articulates: the instant feedback loop, and the fact that you can build things without asking anyone for permission. The second part has aged into something like a philosophy. Snorkel itself is an infrastructure play built on the premise that enterprises shouldn't have to wait for months of manual labeling every time they want to build a new AI application. The permission slip, in other words, is the data bottleneck. Ratner built a company to eliminate it.