He builds AI you can audit, not just admire - and asks whether it will make us wise enough, fast enough.
Ask Elicit a research question and it goes to work the way a diligent graduate student would, if that student never slept and had read the entire scientific literature. It finds the relevant papers, pulls out the numbers, screens the junk, and hands back a summary with its receipts attached. Andreas Stuhlmüller runs the company that makes it, and he is unusually stubborn about one thing: you should be able to see how it got there.
"Elicit's mission is to scale up good reasoning."
That sentence sounds modest until you sit with it. Most of the AI industry is racing to make the biggest, most capable model and then trusting the answer that falls out. Stuhlmüller's bet runs the other direction. He wants systems built out of small, legible pieces - each step a human could check - so that when the machine tells a scientist which drug trial to trust, the reasoning is on the table, not hidden inside a billion parameters.
Elicit serves hundreds of thousands of researchers and analysts every month, the people who make high-stakes calls in life sciences, health economics, and policy. For them, a confident-sounding but wrong summary is worse than no summary at all. So the product is engineered around a single discipline: show the work.
In February 2025, investors put real money behind the idea. Elicit raised a $22 million Series A led by Spark Capital and Footwork, at a reported $100 million valuation, with earlier backers Fifty Years, Basis Set, and Mythos joining. The pitch was not "we have the smartest model." It was "we have the most trusted one."
Trust is a strange thing to sell in a field addicted to demos. But Stuhlmüller has been circling this problem his entire career, long before it was fashionable, and long before it was a company.
Take a hard question and break it into smaller questions that are easy to answer and easy to check. Reassemble the pieces. The insight sounds almost too simple, which is usually a sign it is load-bearing.
Don't just grade the final answer. Watch each reasoning step and reward the ones that are sound. Get the method right and good answers follow - and you can trust them because you saw them assembled.
Prefer transparent, composable parts over a single inscrutable model. If you can understand each piece, you can audit the whole. That is the difference between a tool and an oracle.
"Will AI make us wise enough, fast enough?"
He grew up in Germany and studied cognitive science at the University of Osnabrück, the discipline that treats the mind as something you can model rather than merely admire. That instinct - to build a working replica of thinking - never left him.
It carried him to MIT, where he spent six years earning a PhD under Josh Tenenbaum, one of the field's great model-builders. His obsession there was probabilistic programming: writing code that reasons under uncertainty the way a careful person does. In 2014 he built WebPPL, a probabilistic programming language that runs inside a browser, and co-authored an online book on how such languages are designed. He also built Forest, a public library of generative models, and wrote implementations of the Church language. This was years before "AI founder" was a job title anyone chased.
A postdoc at Stanford followed, working with Noah Goodman. Then, in 2018, instead of joining a big lab or cashing in, he did something quieter. He started a nonprofit.
Elicit did not start as a startup. It grew inside Ought, a nonprofit, as a bet that the safest path to powerful AI was to make its reasoning legible.
Ought existed to answer a research question: could you scale up open-ended human reasoning by handing pieces of it to machines, without losing the thread of why? Elicit was the experiment that worked. What began as a tool for exploring factored cognition turned out to be exactly what overwhelmed researchers needed. In 2023 it spun out as an independent public benefit corporation, a legal structure that keeps the mission on the books alongside the balance sheet.
Alongside him the whole way has been cofounder Jungwon Byun, who runs the operating side of the company. Stuhlmüller supplies the research spine; the two of them turned an alignment thesis into a product people pay for.
The loud version of the AI story is about capability - who has the biggest model this quarter. Stuhlmüller keeps pointing at a quieter contest running underneath it: the race between how powerful these systems get and how good our judgment becomes. If capability wins by a wide margin, we end up with tools we cannot check making decisions we cannot question.
His entire body of work is an argument that we can tip that race the other way. Improve epistemics - the quality of collective reasoning - and pioneer systems whose steps you can inspect, and AI becomes an amplifier of human wisdom rather than a substitute for it. Elicit is the commercial proof-of-concept: an AI you deploy precisely because you can see it working.
He publishes the research openly. "Factored Verification," a method for catching AI hallucinations by decomposing summaries and checking the parts, came out in 2023. "Iterated Decomposition" showed how supervising the reasoning process improves answers to hard science questions. These are not marketing white papers. They are the receipts for a worldview.
Which is the through-line, if you want one. From a browser-based probabilistic language to a research assistant used by hundreds of thousands, Stuhlmüller has been building the same thing over and over: machines that reason in the open, where a person can follow along and, when necessary, disagree.
He writes his name with an umlaut - Stuhlmüller - which unpacks, roughly, to "chair-miller" in German.
His personal site still links to Church, WebPPL, and Forest, relics from a probabilistic-programming past he never fully left behind.
Elicit began life inside a nonprofit before becoming a venture-backed public benefit corporation - the mission predates the money.
He is an active writer on the AI Alignment Forum and LessWrong, thinking out loud in public about the systems he builds.