He trained an AI detector to 99% accuracy and called it a failure. Then he kept going - to 99.9%, to 99.99%. This is the engineer teaching the world to tell human from machine.
Most founders ship at 99% and call it a win. Max Spero looked at that last one percent - one wrong accusation in every hundred - and decided it was a moral problem, not a rounding error. So he and his cofounder kept grinding the number down until a false positive was a one-in-ten-thousand event. That single decision tells you almost everything about the man running Pangram Labs.
Pangram is an AI-text detector. Feed it writing and it tells you, with unusual confidence, whether a human or a language model produced it. Spero is its cofounder and CEO, based in New York, and in 2025 his small team became the quiet infrastructure behind a chunk of the news and information economy. When journalists field pitches through HARO and Qwoted, Pangram is the filter deciding what reads like a person and what reads like a bot. In December 2025 he shipped Pangram 3.0, which stopped giving a single verdict and started highlighting the exact sentences a machine touched.
The pitch is deceptively simple. The stakes are not. Spero frames detection as a question about the value of information itself - about whether a reader can still trust the words in front of them. He is not interested in catching students for the thrill of it. He is interested in keeping the line between human and machine writing legible, before it disappears.
There is a version of this company that launched eighteen months earlier. It would have been accurate enough to demo well, raise money, and accuse the occasional innocent writer. Spero rejected it. A detector that wrongly flags real human writing isn't a minor bug - it's a person told their honest work is a fraud. So the team treated every false positive as unacceptable and chased the error rate into the ground.
He talks about it the way a watchmaker talks about tolerances. "We trained our first model to 99% accuracy and decided that a 1% error rate wasn't good enough," he says. "We pushed to 99.9%, 99.99%." Each nine after the decimal point costs exponentially more work. He paid it.
Bars scaled to illustrate relative error; teal = shipped target.
The origin story starts in a Stanford freshman dorm, where Spero met Bradley Emi. Both ended up studying machine learning and artificial intelligence - Spero took a B.S. in theoretical computer science and an M.S. in AI. Then they scattered into the industry the way ambitious engineers do.
Spero built and shipped machine learning at Google, Two Sigma, and Yelp. His last stop before founding a company was Nuro, the autonomous-vehicle outfit, where he led the active learning effort - teaching self-driving systems to find and learn from the data they were worst at. Emi, meanwhile, was on the computer vision team at Tesla Autopilot. They were both, in different garages, building AI for the real world.
Then GPT-4 arrived, and the question that had been abstract became urgent. "By the time GPT-4 came out, we realized that the wide availability of generative AI was exciting and it had the potential to do good," Spero says, "but it also had the potential to do harm, especially to do harm at scale." The two dormmates reunited in 2023 to build the antidote: a reliable way to tell human from machine, "in an unlimited number of use cases."
Pangram finds the documents hardest to classify, generates synthetic "mirrors" of them, and retrains against those, over and over. It studies the cases where it's weakest instead of the ones where it's already right - the same active-learning instinct Spero honed on self-driving cars.
The detector can do more than Pangram lets it. Spero treats extra capabilities as a temptation: "that information is a distraction from our mission, not in aid of it." A tool that knows less, used precisely, beats one that knows everything and confuses the user.
His nightmare isn't sci-fi. It's mediocrity by default - that "people will come to accept" it, asking "why do something great when AI can do it decently, for a fraction of the effort?" The detector is, in part, a defense of the human urge to do hard things well.
"Being able to see beyond and under the text has implications for reader trust in journalism, marketing, propaganda, deliberate disinformation, fraud, and the very value of information itself."
"What was needed was a reliably accurate way to differentiate human from AI writing in an unlimited number of use cases."
"AI can be a cheating tool or a learning tool. The difference is what you do next."
"I enjoy that the company has allowed me to meet so many new and interesting people across all kinds of career fields."