A San Francisco company with delivery centers in Nairobi, Kampala and Bengaluru, labeling the data that trains the models you already use.
If you used a chatbot today, watched a model identify a stop sign, or trusted a search result to be roughly relevant, there is a non-trivial chance that somewhere upstream a Sama annotator drew a box around the thing that taught the algorithm what it was looking at.
Sama is what happens when a social enterprise grows up and discovers it accidentally became critical infrastructure. The company employs more than 3,000 people across San Francisco, Nairobi, Kampala and Bengaluru. It supplies labeled data, model evaluation and red-teaming to the kinds of customers - Google, Walmart, NVIDIA, Ford, Microsoft - that do not put your vendor list on a marketing page.
You will not see Sama in the AI hype cycle. The company sells the part of AI that does not demo well: human review, edge-case detection, evaluation rubrics, the boring brutal grind of teaching a model what is true. It is the part most of the industry would rather not think about. It is also the part the industry cannot ship without.
Sama sells the unglamorous half of artificial intelligence - and somehow makes it look like a moat. — Editor's note
Back when "AI" mostly meant "spam filters," Leila Janah noticed a market failure that the technology industry was, charitably, ignoring. There were hundreds of thousands of people in low-income communities with the literacy, focus and broadband to do digital piecework. There were billions of dollars in data-labeling work waiting to be done. The two were not meeting.
The conventional answer was crowdsourcing on the cheap - pennies per task, no benefits, no career, no quality SLA. Janah thought that was both ethically lazy and operationally wrong. Cheap labor produces cheap data. Cheap data produces models that fail in expensive ways.
The Sama bet was that you could pay full-time wages, train people for years, build a quality system around them, and out-perform the gig market on accuracy by a wide enough margin that enterprise customers would happily pay for it. The bet, as it turned out, was correct.
"Give work, not aid." — Leila Janah, founder, 2008
Sama began life in 2008 as Samasource, a nonprofit. It hired in Nairobi and Kampala, taught people to annotate images, and quietly accumulated customer relationships with technology companies who needed clean training data more than they needed a marketing story.
By 2019, the model had outgrown its legal wrapper. Janah spun the operating company out as Sama, a for-profit B Corporation, with the original nonprofit - now the Leila Janah Foundation - retained as a major shareholder. It was an unusual structure. It was also the only structure that let the company scale to compete with conventional vendors while preserving the wage commitment that made the work worth doing.
Janah died unexpectedly in January 2020 at 37. Wendy Gonzalez, who had joined in 2015 from a career at EY and Capgemini, stepped in - first as interim CEO and then, before the year was out, as the full-time chief executive. The cultural reset that often follows a founder's death did not happen. The strategy hardened instead.
A small chronology of an unusually patient company.
The Sama product is two things superimposed: a managed annotation workforce and the platform that orchestrates it. The annotators handle image, video, 3D point cloud, LiDAR and text. The platform - SamaHub for collaboration, SamaIQ for quality insight, SamaAssure for the contractual quality guarantee - is what turns a workforce into something a Fortune 500 customer will sign a master services agreement with.
SamaAssure is the headline number. It promises a 98% first-batch acceptance rate, and the company will put it in writing. In a category where vendors usually quote a confidence interval and pray, that is unusual. It is also a useful screen: if a competitor cannot say that number out loud, they probably should not be labeling your autonomous-vehicle data.
The newer product is Sama Red Team, shipped in April 2024. It pairs domain experts with proprietary algorithms to adversarially test generative AI models for bias, safety failures, privacy leaks and compliance gaps. The pitch is straightforward and a little dry: you are about to ship an LLM to millions of people. You should probably know how it breaks before they find out.
Approximate revenue mix by service line · Editor estimates from public statements
The category did not need another labeling tool. It needed someone willing to sign the SLA. — A buyer who is not allowed to be quoted by name
Seven product lines · One workforce · One quality SLA
Image, video, 3D point cloud and LiDAR annotation for computer vision and ML.
Dataset curation to surface edge cases and reduce bias before training begins.
Supervised fine-tuning, RLHF, prompt-response generation and LLM evaluation.
Adversarial testing for generative AI - bias, safety, privacy and compliance.
Human-in-the-loop insight engine pairing experts with proprietary algorithms.
Collaboration workspace for projects, sampling and customer reporting.
The 98% first-batch acceptance guarantee. In the contract, not the deck.
Sama raised a $70M Series B in November 2021, led by CDPQ - Quebec's pension giant - with First Ascent Ventures and Vistara Capital Partners. That is not a logo you see on a typical AI Series B. It is the logo you see when a category is graduating from venture novelty to long-duration infrastructure.
The customer list reads less like a startup pitch and more like the index of a regulatory filing: Google, Walmart, NVIDIA, Ford, Microsoft, General Motors. Sama claims a majority of the Fortune 50 has bought from it at some point. The dollar amounts are private; the inertia they imply is not.
The social receipts are similarly unsexy and similarly real. The company says it has helped lift more than 69,000 people out of poverty through living-wage employment in its delivery centers, and it has the B Corp certification and third-party audits to back that up. You do not have to find that mission moving to find it operationally useful: turnover at the annotator level is far lower than at gig competitors, which is why the quality numbers are what they are.
Pay people enough to stay, train them long enough to get good, and your accuracy curve takes care of itself. — The Sama theory of the case
Sama's mission is the same one Janah wrote down in 2008. The company's competitive advantage is that the mission turns out to be operationally useful: living wages produce stable workforces, stable workforces produce trained annotators, trained annotators produce 98% acceptance, and 98% acceptance gets you on the master vendor list at NVIDIA.
The cynical reading - that ethics is a marketing layer on top of a labor-arbitrage business - does not survive a look at the balance sheet. Sama spends more on wages, training and benefits than the cheap-and-cheerful competitors. It charges enterprise prices because it ships enterprise quality. The mission is not the marketing. The mission is the unit economics.
The hard problems in AI for the next five years are not compute. They are evaluation, alignment, red-teaming, and the data discipline to keep generative models from confidently lying at scale. None of that is automatable. All of it is what Sama already does.
The companies shipping the loudest AI products are the same ones quietly buying more annotation, more evaluation, more red-teaming. Sama is the unglamorous infrastructure underneath the glamorous demos - the inspection layer between a model and the public. That position tends to compound.
And the part that does not show up in the analyst report: the workforce. As regulators and customers begin asking real questions about where AI training data comes from, who labeled it, and under what conditions, Sama's audit trail looks a lot less like a CSR slide and a lot more like a moat.
The next AI scandal will be a data scandal. Sama is selling the only insurance policy that exists. — Editorial
The room at 2017 Mission Street is still full of customers nobody talks about. The difference, sixteen years in, is that those customers are now the ones whose AI products you actually use. The annotator in Nairobi who drew a box around a pedestrian at 3am Pacific time has a stake in the model that decides not to hit one. The B Corp paperwork is no longer the unusual thing about Sama. The fact that it works at this scale is.
Sama set out to give work, not aid. It ended up doing both - and quietly, accidentally, building one of the most important pieces of infrastructure in the modern AI stack.