Tagged Content
Everything on the platform tagged with llm-evaluation.
Ankur Goyal is the Founder & CEO of Braintrust, a San Francisco-based AI evaluation and observability platform that helps engineering teams ship reliable AI products. Previously VP of Engineering at SingleStore (MemSQL) and founder of Impira (acquired by Figma in 2022), Goyal brings over a decade of distributed systems and ML infrastructure experience to the challenge of making AI applications production-ready. Braintrust has raised $121M in total funding, including an $80M Series B at an $800M valuation in February 2026, backed by ICONIQ Capital, Andreessen Horowitz, and a roster of elite angels including Greg Brockman and Elad Gil.
Mohamed Elgendy is the Co-Founder and CEO of Kolena, a San Francisco-based AI testing and validation platform that raised $21M to help enterprises build reliable, trustworthy AI systems. An Egyptian-American technologist and author of the widely-read 'Deep Learning for Vision Systems' (Manning Publications, 20,000+ copies sold), Elgendy cut his teeth building AI/ML organizations at Amazon, Twilio, Rakuten, and Synapse (acquired by Palantir) before founding Kolena in 2021. His mission: bring software engineering rigor - unit testing, regression analysis, scenario-level validation - to a field that has long relied on aggregate accuracy scores that mask real-world failures.

Shreya Shankar is a PhD candidate at UC Berkeley's EPIC Lab building AI-powered data systems that are reliable and cost-efficient. A Stanford-trained engineer who worked at Google Brain and Meta, she bridges academic research and industry practice through DocETL (an open-source LLM data processing system with 3.5K+ GitHub stars used by 30+ S&P 500 companies), an O'Reilly book on AI evals co-authored with Hamel Husain, and a Maven course that has reached 4,500+ professionals. She is on the CS faculty job market and gave a faculty candidate talk at Carnegie Mellon in March 2026.

ZeroEval is a New York-based AI startup from Y Combinator's Summer 2025 batch building an auto-optimizer for AI agents. Founded by Jonathan Chavez and Sebastian Crossa - two friends who met in college in Mexico - the platform captures every interaction your AI agent makes, scores quality with custom LLM judges, and automatically turns real production data into better prompts. The result: agents that get smarter after launch without manual intervention. Trusted by DoorDash, Datadog, Hugging Face, and Harvard Medical School, ZeroEval closes what the founders call 'the last mile reliability gap' in agentic AI.