Henry Ehrenberg

Origin

The contrarian bet that paid off

Around 2016, most of the AI world was racing to build better models. Bigger architectures, more compute, flashier benchmarks. Henry Ehrenberg and a group of Stanford researchers asked a different question: what if the training data was the actual problem?

It wasn't a popular position. It wasn't even an obvious one. But inside Stanford's Hazy Research lab, under Professor Chris Re, Ehrenberg and his collaborators were building something they called Snorkel - a system that let users train state-of-the-art machine learning models without labeling a single training example by hand. Instead, users wrote small functions expressing domain knowledge. The system learned to combine them into probabilistic labels at scale.

The paper that emerged from this work - "Snorkel: Rapid Training Data Creation with Weak Supervision" - became one of the most cited works in data-centric AI. It didn't just describe a technique. It identified a gap that the entire industry had been ignoring while staring at leaderboards.

Generative AI quality depends on the data used to tune and align models.

- Henry Ehrenberg, Snorkel AI

Ehrenberg came to that insight with an unusual toolkit. He grew up in Seattle, graduated from University Prep in 2011, then went east to Yale for applied mathematics before heading back west to Stanford. He did computational biology research at the Weizmann Institute in Israel. He interned in data science. Then, somewhere in the middle of a graduate program in computational and mathematical engineering, he found his problem.

Weak supervision wasn't a clever hack. It was a rethinking of where expertise lives. Instead of having experts manually annotate thousands of examples, what if you could capture their knowledge in code? Labeling functions, pattern rules, heuristics - things domain experts already think in. The Snorkel system turned that logic into a machine learning pipeline 10 to 100 times faster than conventional methods.

Key Insight

The original Snorkel research project began as an academic experiment. Five researchers turned it into a company. The company turned a paradigm shift into a $1.3B enterprise. The insight didn't change - only the scale.

What Snorkel AI Does

Building the data infrastructure layer for the AI era

10-100x

Faster Labeling

Programmatic weak supervision creates training data 10 to 100 times faster than hand-labeling - without sacrificing quality.

5 of 10

Top US Banks

Five of the ten largest US banks use Snorkel AI for mission-critical AI systems. Finance, fraud detection, document intelligence.

$338M

Total Raised

Backed by Greylock, Lightspeed, GV, In-Q-Tel, and others. Series D closed May 2025 at $1.3B valuation.

Career

From the lab to the ledger

Before Snorkel AI had a name, it was a research problem. Ehrenberg was part of the original group at Stanford that developed the weak supervision framework, contributing to papers that would shape how the field thinks about data. His technical background - formal mathematics at Yale, computational engineering at Stanford, plus a stint doing representation learning at Facebook Applied AI - gave him range that pure ML researchers often lack.

At Facebook, Ehrenberg worked on the applied AI team's representation learning group, building systems that operated at the scale of one of the largest social platforms in the world. It was the kind of hands-on engineering exposure that most academic AI researchers never get. He brought that practical sensibility back to the startup.

When Snorkel AI incorporated in 2019, Ehrenberg took the Head of Engineering role alongside co-founders Alexander Ratner (CEO), Chris Re, Paroma Varma, and Braden Hancock. The founding team was unusual: more academic firepower than most enterprise software companies see in a decade, wrapped in a company that had to ship products on real deadlines for Fortune 500 clients.

The bet that training data - not models or compute - would decide whether machine learning worked.

- Snorkel AI founding thesis

Snorkel's initial pitch was about training data for traditional ML models. Then generative AI arrived and changed everything - except the core thesis, which turned out to be more right than anyone expected. Large language models still needed fine-tuning data. They still needed evaluation datasets. They still needed programmatic pipelines for preference labeling and RLHF. The Snorkel approach scaled directly into the new paradigm.

By 2025, the company's client list included not just banks and insurers but Google, OpenAI, Anthropic, and Microsoft - the companies building the frontier models themselves. Snorkel AI became, in a sense, the picks-and-shovels business of the AI gold rush, the company that made everyone else's models better by solving the data problem no one else wanted to solve.

In May 2025, the company raised a $100M Series D at a $1.3B valuation. The round brought total funding to $338M. Ehrenberg's decade-long bet on data quality over model quantity had reached unicorn status.

Career Timeline

Key Milestones

2011

Graduates University Prep, Seattle

~2014

Research at Weizmann Institute of Science (computational biology)

2015

Joins Stanford AI Lab (Hazy Research under Prof. Chris Re)

2016-17

Core developer on open-source Snorkel project at Stanford

2017

Co-authors landmark Snorkel paper (VLDB 2018)

~2018

Senior Research Scientist, Facebook Applied AI

2019

Co-founds Snorkel AI

2024

AWS Strategic Collaboration Agreement; LinkedIn Top Startup

2025

Series D - $100M at $1.3B valuation

$1.3B

Unicorn
Valuation

2019

Year
Founded

1,200+

Team
Members

Co-founders
from Stanford

Series D

Latest
Funding Round

$338M

Total
Raised

Engineering

The engineer in the room

Snorkel AI has three types of co-founders: the researchers who conceived the academic work, the operators who scaled the business, and the engineers who built what customers actually use. Ehrenberg sits squarely in that third category, serving as Head of Engineering and responsible for the technical strategy that keeps Snorkel's platform competitive in one of the fastest-moving markets in technology.

The engineering challenge at Snorkel isn't trivial. The company's platform needs to work for a compliance team at a bank, a clinical NLP team at a hospital system, a document intelligence team at an insurer, and a research lab training foundation models - each with different data schemas, security requirements, deployment constraints, and evaluation criteria. Building software that works across that range without becoming a mess of enterprise configuration switches is genuinely hard.

Ehrenberg's approach has been to stay close to the research side of the company while managing a production engineering organization. He has co-authored blog posts on enterprise GenAI evaluation, contributed to thinking about specialized evaluators as scalable proxies for subject matter experts, and maintained the academic rigor that set Snorkel apart from less principled competitors.

The Data-Centric AI Thesis

While the industry debates model architectures and parameter counts, Ehrenberg's team argues that the quality and structure of training data is the primary determinant of whether an AI system works in production. Snorkel's growth from research project to unicorn is the empirical evidence for that claim.

He spoke at the AI & Big Data Expo North America in 2024, bringing the company's perspective on where enterprise AI actually gets stuck - not at inference time, not at deployment, but at the training data pipeline. It's a message that resonates differently when it comes from someone who co-authored the research papers that started the conversation.

His GitHub handle is "henryre" - a small nod to his middle name, a tiny detail that hints at the person behind the LinkedIn profile. He joined Twitter in February 2013, well before Snorkel AI existed, and keeps a relatively quiet presence. The work speaks. His Google Scholar profile shows a researcher who went into industry without abandoning the rigor that comes with peer review.

In 2022, his alma mater University Prep in Seattle featured him in their alumni magazine as one of four "Creative Thinkers" among their graduates. It's the kind of recognition that matters more than a TechCrunch headline: the school that knew him before any of this is still proud enough to say so.

Education

🏫

Yale University

Bachelor of Science - Applied Mathematics

~2011-2015

🏫

Stanford University

Master's - Computational & Mathematical Engineering

~2015-2019 (Hazy Research Group)

🏫

University Prep, Seattle

High School

Class of 2011

Achievements

Co-founded Snorkel AI, which reached $1.3B valuation after $338M in total funding across 7+ rounds

Co-developed the Snorkel weak supervision system - one of the most cited works in data-centric AI research

Published peer-reviewed research at VLDB, ICML, ICLR, NeurIPS on programmatic data labeling and augmentation

Built engineering infrastructure serving five of the top ten US banks and leading frontier AI labs

Angles

Five things worth knowing

His GitHub handle is "henryre" - a quiet reference to his middle name that most people searching for him in code repositories will overlook entirely.

He did computational biology research at the Weizmann Institute in Israel before pivoting to machine learning at Stanford - a path that is unusual even by the standards of unusual AI founder backstories.

His full legal name is Henry Kiss Ehrenberg. The middle name is a family name folded into the legal record, the kind of detail that distinguishes a LinkedIn profile search from a deep background check.

His high school - University Prep in Seattle - featured him in their alumni magazine as a "Creative Thinker" alongside three other innovative graduates. The school recognized him before the valuations did.

The original Snorkel paper appeared in VLDB in 2017 and became one of the most influential papers in data-centric AI. It was a grad school research project. It became the thesis of a unicorn.

HenryEhrenberg

The contrarian bet that paid off

Building the data infrastructure layer for the AI era

From the lab to the ledger

The engineer in the room

Five things worth knowing

Henry
Ehrenberg