Two Friends, One Very Expensive Problem
Jonathan Chavez and Sebastian Crossa met during their first year of college in Mexico. Seven years later, both were deep in the machinery of AI infrastructure - Jonathan on the LLM observability team at Datadog, watching enterprises struggle to understand why their models kept failing; Sebastian as a founding engineer building email at Micro (backed by a16z) and before that at Atrato (YC W21).
They had seen the same problem from different angles: companies build an AI agent, ship it, and then have no reliable way to know why it is performing badly or how to fix it. The evaluation tooling either doesn't exist, requires a small army of data labelers, or produces static judges that degrade the moment your production data drifts from your test data.
Before applying to YC, they built a side project together - llm-stats.com, an LLM leaderboard that quickly grew to 60,000 monthly active users and a third of a million unique visitors. It was proof they could build things people actually wanted. ZeroEval is the serious version of that instinct applied to a much harder problem.
The companies that win the next decade of AI won't be those that build the best agents. They'll be the ones whose agents get better over time.
- ZeroEval founding thesisThat single sentence explains everything about the company's positioning. ZeroEval isn't trying to make the world's best AI model. It's building the feedback infrastructure that makes your model - whatever it is - incrementally smarter every day without you having to manually audit thousands of outputs.