Company Dossier · No. 042

Surge AI is the quietest billion-dollar company in AI.

No venture capital. No sales team. No conference booth. Just a small office on Fillmore Street where roughly 110 people produce the human-labeled data that trains every model you actually use.

RLHFData AnnotationSan FranciscoFounded 2020Bootstrapped

The Factbox

Founded2020

HQSan Francisco

Employees~110

Labelers~1,000,000

Revenue~$1.2B

FundingBootstrapped

FounderEdwin Chen

Filed under: companies you have not heard of that make the AI you use possible.

By YesPress Desk Filed · San Francisco Read · 7 min

The company that won the AI gold rush by selling the picks.

It is a Tuesday morning in San Francisco, and somewhere in a Pacific Heights walk-up a copy editor is arguing with a chemistry PhD about whether a chatbot's answer is technically correct, merely plausible, or actively dangerous. They both work for Surge AI. Neither of them is famous. Both of them, in a small but real way, are training the model on your phone.

Surge AI does not sell a chatbot. It does not sell a vector database. It does not sell, frankly, anything that looks like a product on a startup landing page. What it sells is something stranger: the careful attention of people who happen to be good at reading. The world's biggest AI labs - OpenAI, Anthropic, Google, Meta, Microsoft - pay them very large amounts of money for it.

The cliché says data is the new oil. Surge AI's version is more accurate, and more annoying: data is the new editing. - The thesis, on a single line

The problem they saw before everyone else did

Long before "RLHF" appeared on conference slides, Edwin Chen was an ML engineer at Twitter trying to do something that, on paper, should have been easy: figure out whether tweets were positive, negative, or neutral. The model wasn't the problem. The labels were. The crowd-sourced annotators marking the training data didn't agree with each other, didn't always read carefully, and in some cases didn't speak the language at the level the task required. The model learned what they taught it, which was mostly noise.

Chen had done a stint at Facebook too, and the pattern repeated. Bigger models did not fix bad labels. Better architectures did not fix bad labels. The thing that fixed bad labels, it turned out, was paying attention. He left, founded Surge in 2020, and bet the company on a thesis that sounded almost too obvious to be a thesis: if you want a model to learn good behavior, you have to show it good behavior, and showing it good behavior requires humans who are actually good at the thing.

Everyone wanted to talk about the model. Edwin Chen wanted to talk about the homework. - The founder's bet, in one sentence

The bet: hire harder than the labs do.

Most of the data labeling industry runs on volume. Pay a global crowd a few cents a task, get a million labels by lunch, hope the law of large numbers averages out the mistakes. Surge built the opposite company. Annotators are screened, ranked and matched to tasks by expertise: lawyers grade legal answers, doctors grade medical answers, native speakers grade translations, writers grade prose. The platform - templates, WYSIWYG editors, a Python SDK, inter-annotator agreement checks, a managed service layer - exists mostly to keep the right humans in front of the right questions.

This is not a sexy bet. It is a slightly Victorian bet. It is the bet that the bottleneck in artificial intelligence is, embarrassingly, human attention, and that attention is something you have to recruit, pay, train and respect. It also turned out to be a very lucrative bet, because attention happens to be the input that frontier labs care most about and can least mass-produce internally.

How they grew without trying to

Surge AI did almost no marketing for its first four years. It refused outside investment. It declined most press requests. The customer list grew anyway, because in a small industry word travels: if your labels are better and your project managers reply to email, the labs that need you find you. By the time Inc. and TIME profiled the company in 2025, Surge was reportedly doing around $1.2 billion in annual revenue with about 110 people on payroll. The arithmetic on revenue-per-employee is the kind that makes investors break out spreadsheets at parties.

Every great AI company has a moat. Surge's moat is a hiring funnel. - An observation that becomes less flippant the longer you look at it

A milestone reel for a company that hates milestones.

Edwin Chen rarely tweets about Surge. The company does not run launch events. So most of the company's history has to be reconstructed from filings, customer leaks and the occasional reluctant interview.

// Surge AI · the unofficial timeline

2020

Edwin Chen founds Surge Labs in San Francisco. Bootstrapped. Stays that way.

2021-22

RLHF goes mainstream at the major labs. Surge becomes the quiet supplier of choice.

2023

DataAnnotation.tech grows into one of the largest contractor brands in the field.

2024

Revenue reported around $1.2B. Still no outside funding. Customer roster reads like an AI conference keynote.

2025

Edwin Chen lands on TIME100 AI. Investor talks reported with a16z, Warburg Pincus, TPG.

Revenue per employee, approximate

// Source: public reporting, 2024-2025. Approximate, rounded, illustrative.

Surge AI

~$10.9M

OpenAI

~$1.5M

Google

~$1.6M

What they actually make.

The product is a managed labeling platform with a long, slightly boring feature list - templates, editors, an SDK, dashboards, SLAs, 24/7 staffing. Underneath the feature list is the actual product, which is harder to ship: a community of annotators who have been recruited, vetted, ranked, trained and routed to the work they are best at. Layered on top is a services team that translates "make our model better at biology" into a labeling spec that a biologist can actually run.

Customers use Surge for the boring-sounding things and the existential things in roughly equal measure. RLHF feedback for next-generation language models. Custom evaluation sets for benchmarking. Red-team probes for safety reviews. Toxicity, hate speech, spam and content moderation labels. Multimodal annotations for image and audio data. Reinforcement learning environments where humans play the role of a much more careful reality.

RLHFRankings, rewrites and reward signals from people who have actually read the answer twice.

EvalsCustom benchmarks and post-training tests that go beyond public leaderboards.

SafetyToxicity, hate speech and adversarial probes - the unfun homework of large model deployment.

Who pays for all this

Public reporting names OpenAI, Anthropic, Google, Microsoft and Meta among the customers. The implication, which Surge mostly does not bother to deny, is that almost every model with a name you know has been touched in some way by Surge labels. The labs, who normally enjoy a quiet form of mutual rivalry, share Surge the same way restaurants share a particularly good produce supplier: grudgingly, but reliably.

If you have ever asked a chatbot a strange question and gotten a calm, sensible answer, the chatbot did some of that work. A person at Surge AI did the rest. - The product, in plain language

The mission, stated plainly.

Surge AI's stated ambition is to build the human data infrastructure for AGI, which is the sort of phrase you would expect from a company with a more aggressive PR team. The version of the mission that comes through in interviews is less grand: build AI systems that actually understand humans by paying close attention to humans first, and treat dataset quality as a craft. Chen has been blunt that the alternative - chasing scale and ignoring quality - produces models optimized for engagement rather than understanding, which is the failure mode he watched up close at Twitter.

It is a slightly old-fashioned position. The modern AI conversation prefers the rhetoric of emergent capabilities, scaling laws, and superhuman benchmarks. Surge's bet is that whatever the long-run answer turns out to be, the next decade of progress will be paid for one careful human judgment at a time. The bet has been good so far. Whether it stays good is the open question, and Surge appears comfortable being the company quietly answering it while everyone else argues about model size.

Why it matters tomorrow

As models get bigger, the marginal value of bad data gets worse, not better. A noisy crowd-sourced label that confused a small classifier can quietly corrupt a frontier model's behavior in ways that are extremely hard to undo. The labs know this. It is why their evaluation budgets keep growing and why Surge keeps getting their calls. If you believe AI's hardest problems over the next few years are alignment, safety, factuality and reasoning, you also have to believe that high-quality human feedback is going to be one of the more strategic resources on the board. Surge has spent five years cornering it.

The boring thesis: AI's hardest problems are people problems. Surge AI hired for that. - The argument, compressed

Back to that Tuesday morning.

The copy editor and the chemistry PhD finish their argument. They mark up their answer, leave a note for the next reviewer, and move on to the next task in a queue that, in aggregate, will shape how a model talks to a teenager in Manila, a doctor in Berlin and an accountant in Atlanta. Neither of them will ever know which model, or which user, or whether the conversation went well. They are paid by the hour, occasionally by the rank, and not at all by the headline.

This is what Surge AI sells. Not the model. Not the chatbot. The thousands of small Tuesdays that make the model less wrong. It is a strange product to have built a billion-dollar company on. It is also, on reflection, exactly the product the AI industry should have been buying all along. The rest of the industry is figuring this out now. Surge AI got there in 2020, sat down with a spreadsheet, and quietly started hiring.