On a Tuesday in May 2025, an unflashy enterprise software company in Redwood City announced a $100 million Series D at a $1.3 billion valuation. There were no flying-car renderings. No founder in a black turtleneck promising to redefine consciousness. Just a press release about training data, signed by a former Stanford PhD named Alex Ratner, and a quiet line near the bottom that said seven of the top ten US banks already used the product.
This is Snorkel AI, right now. Profitable in markets where most AI startups are not. Mascot-obsessed in a category where most AI startups are joyless. And built on a thesis that almost nobody believed in 2019 - that the model is not the moat, the data is.
The problem they saw
If you have ever tried to fine-tune a language model on real enterprise data, you already know the punchline. The model is the easy part. The data is a swamp. Pages of contracts with redactions. Doctors' notes with abbreviations only the doctor remembers. Insurance claims that contradict themselves. Decades of institutional knowledge locked inside the heads of people who are about to retire.
For roughly a decade, the industry response was simple and brutal - throw humans at it. Hire thousands of annotators, ship them spreadsheets, and pray they agree. It worked, sort of, for image classification. It did not work for anything that required a domain expert. You cannot crowdsource a radiologist. You cannot offshore an underwriter.
The Snorkel founders, working out of Christopher Ré's lab at Stanford, watched the problem from an unusual angle. They asked: what if instead of paying people to label one example at a time, you wrote a small program that labeled a million examples imperfectly, then combined a lot of imperfect programs into a clean training signal? They called the technique weak supervision, and they published a paper about it in 2017.
The paper has since been cited thousands of times. You can argue it changed how a generation of ML researchers thought about training data. You could not, in 2017, argue it would become a company.
The founders' bet
Snorkel AI was founded in 2019 by Alex Ratner, Christopher Ré, Braden Hancock, Henry Ehrenberg, and Paroma Varma. Five Stanford alums, one open-source project, and a thesis that programmatic data labeling could become its own software category. Greylock wrote the first check. Lightspeed joined the next round. The Series C in 2021, led by Addition, took the company to $135M raised.
And then ChatGPT happened.
For most enterprise data-labeling companies, late 2022 was extinction-level. The pitch had been: we will help you build models from scratch. The new world said: do not build a model, rent one. Snorkel's good fortune - if you want to call it that - was that renting a generic LLM does not actually solve any enterprise problem. A model that knows everything about Wikipedia and nothing about your bank's credit policy is, for a bank, useless. To make it useful, you have to feed it your bank's data, labeled by your bank's experts, evaluated against your bank's standards. Which is what Snorkel does.
The Series D in May 2025 was, in a sense, an announcement that the bet had been right all along.
The Founding Crew
- Alex Ratner - CEO. Stanford PhD under Christopher Ré. Also an affiliate professor at the University of Washington.
- Christopher Ré - Co-founder, MacArthur Fellow, the academic anchor.
- Braden Hancock - Head of Technology. Also a Stanford PhD from the same lab.
- Henry Ehrenberg - Co-founder, infrastructure brain.
- Paroma Varma - Co-founder. Cosigned the original weak supervision research.
Six years, one thesis
- 2016Snorkel open-source project launches at the Stanford AI Lab.
- 2017Foundational weak supervision paper drops. Citations begin to compound.
- 2019Snorkel AI incorporates. Greylock leads seed round.
- 2020Series A ($15M) with Lightspeed joining. Snorkel Flow gets its name.
- 2021Series B and Series C land in the same year. Total raised passes $135M.
- 2022-24Pivots from training-from-scratch to fine-tuning and evaluating foundation models. Quietly wins financial services and federal accounts.
- 2025$100M Series D at $1.3B valuation. Snorkel Evaluate and Expert Data-as-a-Service go GA. Accenture invests.
The product, more or less
Strip the marketing copy off the website and Snorkel sells three things. Snorkel Flow is the original platform. You upload your messy enterprise data, you write small Python functions that encode domain knowledge ("if the document mentions a maturity date and a coupon, it is probably a bond"), and the platform stitches those weak signals into a clean labeled dataset. You then use that dataset to fine-tune or evaluate a model. The whole loop, programmatically.
Snorkel Evaluate, generally available since May 2025, does the same thing for benchmarking. Because every enterprise eventually discovers that public LLM leaderboards are useless. A model that scores 92 on MMLU may score 41 on your actual workflow. Snorkel Evaluate lets you build the benchmark that actually matters - the one for your industry, your taxonomy, your edge cases.
Snorkel Expert Data-as-a-Service is the newest and, in a way, the least Stanford-AI-Lab of the three. It is a network of subject matter experts - lawyers, doctors, financial analysts - paired with the platform. A white-glove offering for customers who do not have ten in-house radiologists with free time. Some have called this a return to services-led AI. The company would call it the only honest answer to what the highest-stakes customers actually want.
Snorkel Flow
The original AI data development platform. Programmatic labeling, curation, fine-tuning data prep.
Snorkel Evaluate
Custom benchmarks for LLMs, RAG, and agents on domain-specific tasks.
Expert Data-as-a-Service
Vetted subject-matter experts plugged into the platform.
Snorkel (OSS)
The original open-source weak supervision library. Still alive at snorkel.org.
The proof
It is one thing to claim that data is the moat. It is another to convince a bank's chief risk officer. Snorkel's customer list, what the company has been willing to publicize, reads like a curated answer to that question - BNY, Wayfair, Chubb, Experian, Intel, and per the company's own marketing, seven of the top ten US banks. The US Air Force is in there too. So are several federal agencies the company will not name.
The Series D itself doubled as a vote of confidence. Addition led again. Greylock and Lightspeed re-upped. BNY, already a customer, joined as a strategic. Prosperity 7 Ventures, the venture arm of Saudi Aramco, took the round international. And then Accenture, in a separate announcement, made a strategic investment of its own and announced a financial services co-build.
Funding round by round
The mission, said out loud
"Make AI data development as systematic and scalable as software engineering." It is a mission statement that sounds modest until you sit with it for a minute. Software engineering took fifty years to become a discipline. It got version control, code review, CI, package managers, type systems, the entire scaffolding of trust. Data work, by contrast, has remained roughly where carpentry was around 1840 - artisanal, expensive, and unreviewable.
Snorkel's argument is that AI cannot scale past hobby projects until data work catches up. Until "I labeled 10,000 examples last week" stops being a heroic story and starts being a git commit. The argument is not new. Andrew Ng has been pounding on the data-centric AI drum for years. What is novel is having a product that an underwriter can actually use, and a customer list that pays for it.
Why this matters tomorrow
If you believe the next five years of AI are about agents - systems that take actions inside specific industries, with specific consequences, under specific liability frameworks - then almost every interesting problem is a Snorkel problem. An agent that mis-classifies a loan application costs money. An agent that mis-reads a medical chart costs lives. The thing standing between today's chatbots and tomorrow's reliable agents is not bigger models. It is better data, better evaluations, and better ways of capturing what the actual humans in the room already know.
This is the territory Snorkel has been quietly claiming. It is not glamorous. There is no superintelligence narrative attached. But the boring work is, increasingly, the only work that pays.
Back to that Tuesday in May
Six years after a Stanford research project became a company. Five years after the founders bet that the model would commoditize and the data would not. One year after the rest of the industry started saying the same thing out loud. The Redwood City office is still there. The mascot is still a cartoon snorkeler named Dr. Bubbles. The press release was, by Silicon Valley standards, almost embarrassingly grown-up.
And somewhere on the other side of that announcement, in a risk department at a bank you have heard of, a data scientist is writing a small Python function that will eventually label half a million documents. Without knowing it, she is contributing to a quiet rearrangement of where value lives in AI - away from the model, toward the data.
This is what Snorkel AI looks like in 2026. Not loud. Not unicorn-cute. Just, finally, right.