BREAKING
MIT Rising Stars in EECS 2025 DocETL: 3,500+ GitHub stars - deployed by 30+ S&P 500 companies O'Reilly book "Evals for AI Engineers" out Spring 2026 Maven's highest-grossing course ever - 4,500+ students Faculty candidate talk at Carnegie Mellon - March 2026 UIST 2024 Best Paper Honorable Mention 25,000+ newsletter subscribers SPADE deployed in LangSmith across 2,000+ pipelines MIT Rising Stars in EECS 2025 DocETL: 3,500+ GitHub stars - deployed by 30+ S&P 500 companies O'Reilly book "Evals for AI Engineers" out Spring 2026 Maven's highest-grossing course ever - 4,500+ students Faculty candidate talk at Carnegie Mellon - March 2026 UIST 2024 Best Paper Honorable Mention 25,000+ newsletter subscribers SPADE deployed in LangSmith across 2,000+ pipelines
Shreya Shankar - ML Engineer and PhD Researcher
ML ENGINEER • RESEARCHER
UC Berkeley • EPIC Lab • PhD Candidate

Shreya
Shankar

The person making AI evals a real discipline - before she even finishes her PhD.

Former Google Brain and Meta ML engineer turned researcher, Shreya Shankar is building the infrastructure that makes AI-powered data systems actually work in production. Her tools are running inside 30+ S&P 500 companies. Her course changed how thousands of engineers think about evaluation. Her book is on O'Reilly shelves. She's still a PhD student.

ML Engineering Data Quality LLM Evals DocETL O'Reilly Author Open Source
3.5K+
DocETL GitHub Stars
30+
S&P 500 Deployments
4,500+
Course Students
25K+
Newsletter Subscribers
2K+
LangSmith Pipelines

The Engineer Who Refused to Pretend Production ML Was Fine

Shreya Shankar is not building the next chatbot. She is building the infrastructure that lets everyone else's AI systems stop lying to them. Her work sits at the uncomfortable intersection of data systems research and human-computer interaction - a combination unusual enough that she publishes top-tier papers in both VLDB and CHI in the same calendar year.

The core insight behind everything she does: production machine learning breaks in ways that are invisible, expensive, and preventable. Bad data does not announce itself. LLM outputs cannot be trusted at face value. Pipelines that worked last Tuesday silently drift by Thursday. Most engineering teams discover these failures when customers complain. Shreya's work makes the breakage visible before that happens.

Her flagship open-source project, DocETL, processes unstructured text at scale using LLMs and has been deployed in 4,100+ pipelines across more than 30 S&P 500 companies. It has also been adopted by public defenders in two California counties - which means research born in a Berkeley PhD program is now helping people keep their freedom. That is a fairly short trip from dissertation to courtroom.

Before any of this, she spent time as an ML engineer at Google Brain and Meta. Those jobs gave her a detailed education in what actually goes wrong in production - and a specific kind of frustration. The data quality problems she kept running into had no good tools. The evaluation workflows were ad hoc. The observability was thin. She went back to school not to escape industry problems but to attack them with more serious weapons.

We can't just throw LLMs at problems and hope for the best. We need systematic ways to measure what's working.
- Shreya Shankar

At UC Berkeley's EPIC Lab, advised by Aditya Parameswaran, she has built a research program that is genuinely hard to classify. It is not pure database theory. It is not traditional ML. It sits in the space where data systems meet the humans who build and use them - a space that turns out to be very large and very full of unsolved problems.

Her paper "Operationalizing Machine Learning: An Interview Study" documented eighteen ML engineers talking honestly about what makes production ML succeed or fail. The answer, after all those interviews, was unglamorous: velocity, validation, and versioning. The fundamentals. The same things software engineers figured out decades ago, still being reinvented by ML teams who thought their field was different.

Tools That Escaped the Lab

01
DocETL
An LLM-powered system for analyzing unstructured text at scale. Agentic query rewriting. Automatic evaluation loops. Used by journalists, lawyers, doctors, policy analysts, and public defenders. 3,500+ GitHub stars. 4,100+ pipelines. Published at VLDB 2025 and SIGMOD 2026.
02
SPADE
Automatically synthesizes data quality assertions for LLM pipelines. Catches output errors before they propagate. Deployed inside LangSmith - LangChain's pipeline hub - running across 2,000+ real-world pipelines. Published VLDB 2024.
03
EvalGen
A mixed-initiative tool that aligns LLM-generated evaluators with human preferences. Won Best Paper Honorable Mention at UIST 2024. Addresses the problem of who validates the validators - a question nobody had a good answer to before this paper.
04
AI Evals Course
Co-created with Hamel Husain on Maven. Became Maven's highest-grossing course ever. 4,500+ students from 500+ companies. Fifty or more students each from Google, Microsoft, OpenAI, Meta, Amazon. Turned AI evaluation from a vague idea into an engineering practice.
05
O'Reilly Book
"Evals for AI Engineers: Systematically Measuring and Improving AI Applications." Co-authored with Hamel Husain. Published Spring 2026. Covers error analysis, evaluation pipelines, LLM-as-a-judge. The textbook that will train the next generation of AI engineers.
06
Production ML Research
Deployed "Moving Fast With Broken Data" at Meta. Built ML observability systems. Published across VLDB, SIGMOD, UIST, CHI, CSCW - a publishing breadth that would be unusual for a seasoned professor, let alone a PhD student.
"Data quality is not a one-time fix. It's an ongoing engineering discipline."
- Shreya Shankar

From Stanford to Berkeley, Via Production Trauma

She did the standard top-tier CS path: Stanford for undergrad and master's, concentrating in systems and artificial intelligence. While there, she helped run SHE++, Stanford's nonprofit for underrepresented minorities in tech - a sign of something that shows up consistently in her work, a concern for who gets access and who gets left out.

After Stanford, she went to industry. Google Brain. Meta. The jobs that CS graduates want. She was good at them. She was also watching, up close, how production ML actually fails. The data pipelines that quietly corrupt. The models that degrade on schedules nobody tracks. The evaluation workflows that amount to vibes and hope.

This is where a lot of people write blog posts about problems and move on. Shreya went back to school. She started her PhD at UC Berkeley in 2020 and began treating those industry frustrations as research questions.

The NDSEG Fellowship followed in 2022. So did papers. A lot of papers, published at venues that do not usually appear together on a CV. The combination - deep systems work plus HCI methods, interviews and observational studies alongside formal database research - is unusual. It is also, in retrospect, exactly right for the problem space. You cannot fix production ML without understanding both the technical systems and the humans operating them.

DocETL is the culmination of this approach so far. It is not just a processing system. It is a rethinking of how humans and LLMs collaborate to extract meaning from large unstructured document collections. The companion IDE, DocWrangler, puts that capability directly in researchers' and analysts' hands. The public defenders who picked it up were not thinking about system architecture. They needed something that worked.

The Papers That Moved the Field

"SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines"
VLDB 2024 - with Haotian Li, Parth Asawa, Madelon Hulsebos, Eugene Wu, Aditya Parameswaran, et al.
Deployed in LangSmith across 2,000+ pipelines
"Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences"
UIST 2024
Best Paper Honorable Mention - introduced EvalGen tool
"DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing"
VLDB 2025, SIGMOD 2026
3,500+ GitHub stars - 30+ S&P 500 companies - 2 California county public defenders
"Operationalizing Machine Learning: An Interview Study"
CSCW 2024 - with Rolando Garcia, Joseph Hellerstein, Aditya Parameswaran
18 ML engineer interviews - identified the velocity/validation/versioning triad
"Moving Fast With Broken Data"
arXiv 2023 - with Labib Fawaz, Karl Gyllstrom, Aditya Parameswaran
Deployed at Meta in production
"Towards Observability for Production Machine Learning Pipelines"
VLDB 2023 - with Aditya Parameswaran
Introduced bolt-on ML observability architecture

How She Got Here

2015
Enrolled at Stanford University, Computer Science. Began concentrating in systems and AI.
2018
Helped run SHE++ at Stanford - a nonprofit empowering underrepresented minorities in technology.
2019
Completed B.S. and M.S. at Stanford. Worked as ML engineer at Google Brain.
2020
ML/data engineer at Meta. Started PhD at UC Berkeley EPIC Lab under Aditya Parameswaran.
2022
Awarded NDSEG Fellowship and Bridgewater AI Labs Fellowship. First publications at VLDB.
2023
Published "Operationalizing ML" and "Moving Fast With Broken Data" - the latter deployed at Meta.
2024
SPADE at VLDB, EvalGen Best Paper Honorable Mention at UIST. AI Evals course on Maven becomes platform's highest-grossing ever.
2025
Named MIT Rising Stars in EECS. DocETL at VLDB. Mailing list passes 25,000 subscribers.
2026
O'Reilly book published. Faculty candidate at Carnegie Mellon. Final PhD year. On the CS job market.

The Wins

2025
MIT Rising Stars EECS
Named among the most promising graduate researchers in electrical engineering and computer science
NDSEG
NDSEG Fellowship 2022
National Defense Science & Engineering Graduate Fellowship - awarded to top PhD researchers
UIST
Best Paper Honorable Mention
UIST 2024 for "Who Validates the Validators?" - top HCI venue recognition
#1
Maven's Top Course
AI Evals For Engineers became the highest-grossing course in Maven platform history
O'R
O'Reilly Author
"Evals for AI Engineers" - published by O'Reilly, the technical book standard-bearer, Spring 2026
BW
Bridgewater Fellowship
Bridgewater AI Labs Research Fellowship awarded for doctoral research excellence

The Problem Nobody Wanted to Admit Was Hard

Here is the uncomfortable truth Shreya Shankar's work is built on: most production AI systems are operating with broken data, inadequate evaluation, and no visibility into what is actually going wrong. The engineers building these systems know this. They mostly do not talk about it in public, because the gap between how AI is marketed and how it performs in production is embarrassing.

Shreya talks about it. That is part of what makes her unusual. Her papers have titles like "Moving Fast With Broken Data" and "Who Validates the Validators?" - titles that are polite but precise descriptions of industry-wide failures. The work does not moralize. It builds tools.

The AI evals course she built with Hamel Husain is perhaps the clearest expression of this approach. The problem is not that engineers do not want to evaluate their AI systems properly. The problem is they do not have a systematic way to do it. The course turned an art into a practice - something replicable, teachable, scalable. Fifty people from Google took it. Fifty from Microsoft. Fifty from OpenAI, the company that arguably started this whole boom. They all needed it.

The O'Reilly book is the next step: taking that practice and encoding it as a reference text. When a field produces its O'Reilly book, it has arrived. AI evaluation arrived in 2026, co-authored by a PhD candidate who had been documenting the problem since before most people knew it existed.

She is heading toward academia. The Carnegie Mellon faculty talk in March 2026 was not an accident - it was an audition for the role that fits her work best. Her research is too practical for pure theory departments and too rigorous for pure engineering groups. A CS faculty position, with the freedom to build systems and run studies and publish across venues, is the right container for what she wants to do next.

"The gap between research and production ML is where the most interesting problems live."
- Shreya Shankar
Share This Profile