Labelbox

Who they are, today

A factory, not a feature

Walk into the Labelbox office in the Mission and the first thing you notice is what you don't see. No GPU cluster on a tour-friendly pedestal. No model leaderboard glowing on a wall. Labelbox doesn't make the AI. Labelbox makes the food the AI eats.

Across roughly 470 employees and a global network of vetted experts - PhDs, doctors, lawyers, coders, linguists - the company runs what its CEO calls a "data factory." Two thousand AI-powered expert interviews happen on a busy day. Top raters out-earn US tenure-track professors. The output ships to the frontier AI labs you read about in the morning newsletter, and to enterprises like Genentech, Chegg, ArcelorMittal, and Warner Bros. who would rather not be in the newsletter at all.

"Post-training is the new arms race." - Manu Sharma, CEO, on the Cognitive Revolution podcast

The problem they saw

Models are easy. Data is the bottleneck.

In 2018, the consensus was that AI's bottleneck was algorithms. Researchers fought over architectures. Founders pitched ever-bigger models. Then Manu Sharma, an aerospace engineer who had spent his early career at Planet Labs labeling satellite imagery and at DroneDeploy building drone apps, started saying something quietly contrarian: the algorithms were fine. The data was the disaster.

If you've ever tried to train a useful model, you already know the punchline. The exciting part - the architecture, the loss function - takes a week. The unsexy part - finding, cleaning, labeling, and re-labeling the data - takes a year. Sharma, joined by Brian Rieger and Dan Rasmuson, decided that someone should build the unsexy part properly. Naturally, no one wanted to fund it. Annotation, the term most VCs used at the time, was a dirty word.

"Founders need to be contrarian and right," Sharma later told First Round Review. "Being one without the other is just career-limiting."

"The algorithms were fine. The data was the disaster." - The premise that became Labelbox

The founders' bet

If everyone needs data, sell the picks and shovels

Three founders. One belief. They were not going to train models. They were not going to chase the latest scaling paper. They were going to build the boring tool everyone in the building eventually walked over to borrow - and then they would build the workforce behind it, too.

The first version of Labelbox was an annotation tool for computer-vision teams. Bounding boxes. Polygons. Segmentation masks. Useful. Defensible. A reasonable Series A pitch. Kleiner Perkins and First Round wrote the early checks. Gradient Ventures (the Google AI fund) and Andreessen Horowitz showed up next. Then B Capital. Then SoftBank, which in January 2022 led a $110M Series D that nudged Labelbox into unicorn territory.

And then ChatGPT happened. Everything Labelbox had built for bounding boxes had to learn to handle conversations. Reasoning chains. Multilingual code. Multimodal math. The company had two options: get out of the way, or rebuild the factory at scale. They picked the second.

Annotate

The original tool. Now multimodal: image, video, audio, text, PDFs. SAM2-powered auto-segmentation included.

Catalog

The data engine. Natural-language search and similarity search across up to a billion rows. Sync to AWS, GCP, Azure.

Model Foundry

Foundation models that label your data for you - then you check the work. The robots and the humans share a shift.

Alignerr

The global expert workforce. RLHF, evaluations, complex reasoning datasets. Where the $250K-a-year raters live.

Fig 2. Four products, one production line. Each does the part the others can't.

The product

An assembly line that thinks

The Labelbox platform looks, at first glance, like every other enterprise SaaS dashboard. Then you watch what happens to a piece of data inside it. Raw video comes in. Model Foundry takes a first pass with a foundation model - object detection, segmentation, transcription. The output flows into Catalog, where a curator finds the edge cases. Annotate ships those to a labeler. The labeler's work routes into automated QA. The hard cases get escalated to Alignerr's expert pool. The final dataset lands in Snowflake or Databricks or a cloud bucket, ready for training.

None of this is glamorous. Most of it is, frankly, plumbing. Which is exactly the point. AI labs don't want plumbing. They want a tap they can turn on.

"Labelbox doesn't make the model. It makes everything the model would be terrible without." - The unauthorized one-line summary

Milestones

A short, partial history of an unglamorous company

2018
Founded by Manu Sharma, Brian Rieger and Dan Rasmuson. Seed round from Kleiner Perkins and First Round.
2019
Series A. Gradient Ventures (Google) joins. The annotation tool grows up.
2020
Series B led by Andreessen Horowitz. Catalog launches.
2021
Series C. Customer roster lengthens: Genentech, Warner Bros, Chegg, ArcelorMittal.
Jan 2022
$110M Series D led by SoftBank. Valuation crosses $1B.
2023
Model Foundry launches. Foundation models begin labeling for foundation models. Slightly recursive.
2024
Brand refresh ("Our new look"). Upcraft acquisition folds agentic AI into Alignerr.
2025
Q1 product spotlight: coding, audio, multilingual, multimodal expansions. The factory floor doubles in scope.

The proof

A roster, a number, a partnership list

You can argue with a pitch deck. You cannot argue with a customer list. Labelbox says it works with about 200 enterprise customers and the majority of leading US AI labs. The names that surface in public materials are reassuringly boring: a biotech, a media giant, a steelmaker, an education company. The names that don't surface in public materials are - if Labelbox's claim about 80% of top US AI labs is right - precisely the ones you'd want.

By the numbers

Five small facts that explain the company

Funding

$189M

Customers

~200

Employees

~470

Top labs

80%

Top rater pay

$250K+

Fig 3. Self-reported figures, but the kind a unicorn doesn't tend to invent.

The partnership shelf is similarly load-bearing. Google Cloud, AWS, Snowflake, Databricks - all the places enterprise training data already lives. Labelbox plugs in rather than asking you to relocate, which is the kind of decision that doesn't make a keynote slide but does make a renewal.

"200 customers. 80% of frontier labs. One quiet building on Treat Ave." - A reasonable summary of the moat

The mission

Make the data, mean it

Labelbox's stated mission is to build the infrastructure that turns raw information into the labeled, evaluated, high-quality training data that frontier AI models need. Said plainly: if a model gets smarter next year, someone, somewhere, made that possible by labeling 40,000 examples of "this is a correct chain of reasoning" and "this is not." Labelbox would like that someone to be on its platform.

The company's 2024 brand refresh used three words: craftsmanship, reliability, utility. Three words that would have been considered actively boring at any other moment in the AI cycle. In this one, they sound almost subversive.

aidata factoryrlhfenterprise saasexpert networksan francisco

Why it matters tomorrow

When the models stop scaling, the data has to

There is a growing argument inside frontier labs that scaling pretraining is running out of room. The next jump in model capability - the one that will make today's GPT and Claude look like flip phones - will come from post-training: from carefully designed reasoning data, expert evals, multi-turn dialogue, and human feedback at a scale and quality the public internet cannot supply. That is not a Labelbox marketing claim. That is the consensus in roughly half of the AI research papers published this year.

If post-training is the next arms race, the picks-and-shovels company in that race is the one with the experts, the tools, and the QA pipeline already running. Labelbox has spent seven years quietly building exactly that.

Back to the building on Treat Ave. No GPU on a pedestal. No model leaderboard on the wall. Just a long row of dashboards showing throughput, agreement scores, and how many expert hours shipped today. The next breakthrough you'll read about probably ran through one of them. And the company most likely to take credit will, true to form, say almost nothing.

FILED · YESPRESS DESK

Labelbox is the data factory behind the AI you've been hearing about.