He pays strangers to play a diagnosis game on their phones. The high scores become medical-grade training data for the AI that doctors will lean on next.
Erik Duhaime, photographed for MIT News. The PhD who decided credentials were overrated and accuracy was measurable.
Erik Duhaime sells something doctors have spent a century assuming was impossible to outsource: judgment. At Centaur.ai, the Boston company he runs as cofounder and CEO, thousands of medical students, nurses, and physicians open an app, look at a skin lesion or a pathology slide or a snippet of heart audio, and call it. They earn cash for being right. The aggregate of their answers, weighted by who has proven accurate, becomes the labeled data that teaches a machine to read the same image.
The bet underneath the business is contrarian and specific. Most of medicine trusts the credential - the years of residency, the board certification, the title on the door. Duhaime trusts the scoreboard. His rule, repeated to anyone who asks how it works: trust opinions based on performance, not on years of experience or credentials. Continuously measure each person on cases with known answers, discard the opinions of people who are bad at the task, and intelligently pool the opinions of the people who are good. Do that, and the group beats the individual specialist.
He did not arrive at this from a hospital. He arrived from an evolutionary biology lab and a management PhD. Duhaime studied economics and biology at Brown, took an MPhil in Human Evolution at the University of Cambridge, then landed at the MIT Center for Collective Intelligence under Thomas Malone, the field's founding figure. His dissertation, finished in 2019, was a set of essays on collective intelligence and the future of work - how to aggregate the opinions of many experts into something smarter than any of them.
The clinical version of the idea came from his living room. His wife was a medical student, and Duhaime watched her sink hours into flashcard and quiz apps, drilling cases on her phone. The research on his desk had already shown that a crowd of students, properly pooled, could classify skin lesions more accurately than professional dermatologists. The two facts collided. What if all that studying was not just studying - what if it was productive labeling work for the people building medical AI?
The key is to trust opinions based on performance-based metrics - rather than years of experience or credentials.
Erik Duhaime, on the engine inside Centaur.aiSo in 2017 he built it. With cofounders Zach Rausnitz, a longtime friend from Brown, and Tom Gellatly, who had run the data-labeling team at the self-driving company Cruise, Duhaime launched Centaur Labs and a consumer app called DiagnosUs. The app turns diagnosis into a contest with real prize money. Roughly half its users are medical students; the rest are doctors and nurses. They compete, they learn, and quietly they generate something valuable: high-confidence labels on real biomedical images, audio, and video.
The name is the thesis. A centaur is half human, half machine, and in chess the term describes a human paired with a computer who together beat either alone. That is the whole company in one word. Duhaime is not trying to remove the human from the loop while the rest of the industry races to automate everyone out of it. He is trying to keep the right humans in it, scored and combined, as a feature rather than a bug.
The company moved fast through the usual proving grounds - the MIT Sandbox fund in 2017, MIT's delta v accelerator, then Y Combinator in 2018. In 2021, Matrix Partners led a $15M Series A. In October 2024, an oversubscribed $16M Series B led by SignalFire pushed total funding to roughly $35M, with Matrix, Susa Ventures, Samsung Next, and Alumni Ventures along for the round. The customer list reads like a who's-who of where bad data would be catastrophic: Microsoft, Memorial Sloan Kettering Cancer Center, Mass General Brigham, the pathology-AI company Paige.
That last point is the reason the whole thing matters now. As generative models flood into clinics, the failure mode is not abstract. Duhaime puts it flatly: in healthcare, AI hallucinations can cost lives. A model is only as trustworthy as the data it learned from and the evaluation that keeps it honest after deployment. Centaur.ai is selling the unglamorous, load-bearing layer underneath the hype - the human ground truth. The company that began as a graduate student's side observation about his wife's homework now supplies the answer key for some of the most consequential AI in medicine.
What makes Duhaime worth watching is the discipline of the underlying claim. He did not assert that crowds are wise - that is a bumper sticker, and often wrong. He found the precise conditions under which a crowd becomes wise, measured them, and turned the measurement into a product. The expertise he sells is not anyone's in particular. It is the structure that decides whose opinion to keep.
Slip in cases with known answers. Every labeler is scored in real time on accuracy they cannot see coming.
Opinions from people who are bad at the task get discarded. Credentials buy nothing here. The scoreboard decides.
Weight and combine the proven performers. The aggregate reaches better-than-expert quality, at scale.
The whole company started as a domestic observation: his wife, a med student, drilling cases on quiz apps. He looked at her homework and saw an AI labeling pipeline.
A centaur is half human, half machine. In chess it means a person plus a computer who together beat either alone. The company is its name.
Human evolution at Cambridge, then management science at MIT. He studies how groups think for a living, and built a business out of the answer.
DiagnosUs hands real money to users who diagnose accurately. Studying becomes a game, and the game becomes ground truth.
“He proved a crowd of med students can out-diagnose the specialist. Then he built a company on it.”