Shreya Shankar

Profile

The Engineer Who Refused to Pretend Production ML Was Fine

Shreya Shankar is not building the next chatbot. She is building the infrastructure that lets everyone else's AI systems stop lying to them. Her work sits at the uncomfortable intersection of data systems research and human-computer interaction - a combination unusual enough that she publishes top-tier papers in both VLDB and CHI in the same calendar year.

The core insight behind everything she does: production machine learning breaks in ways that are invisible, expensive, and preventable. Bad data does not announce itself. LLM outputs cannot be trusted at face value. Pipelines that worked last Tuesday silently drift by Thursday. Most engineering teams discover these failures when customers complain. Shreya's work makes the breakage visible before that happens.

Her flagship open-source project, DocETL, processes unstructured text at scale using LLMs and has been deployed in 4,100+ pipelines across more than 30 S&P 500 companies. It has also been adopted by public defenders in two California counties - which means research born in a Berkeley PhD program is now helping people keep their freedom. That is a fairly short trip from dissertation to courtroom.

Before any of this, she spent time as an ML engineer at Google Brain and Meta. Those jobs gave her a detailed education in what actually goes wrong in production - and a specific kind of frustration. The data quality problems she kept running into had no good tools. The evaluation workflows were ad hoc. The observability was thin. She went back to school not to escape industry problems but to attack them with more serious weapons.

We can't just throw LLMs at problems and hope for the best. We need systematic ways to measure what's working.

- Shreya Shankar

At UC Berkeley's EPIC Lab, advised by Aditya Parameswaran, she has built a research program that is genuinely hard to classify. It is not pure database theory. It is not traditional ML. It sits in the space where data systems meet the humans who build and use them - a space that turns out to be very large and very full of unsolved problems.

Her paper "Operationalizing Machine Learning: An Interview Study" documented eighteen ML engineers talking honestly about what makes production ML succeed or fail. The answer, after all those interviews, was unglamorous: velocity, validation, and versioning. The fundamentals. The same things software engineers figured out decades ago, still being reinvented by ML teams who thought their field was different.

Work

Tools That Escaped the Lab

01

DocETL

An LLM-powered system for analyzing unstructured text at scale. Agentic query rewriting. Automatic evaluation loops. Used by journalists, lawyers, doctors, policy analysts, and public defenders. 3,500+ GitHub stars. 4,100+ pipelines. Published at VLDB 2025 and SIGMOD 2026.

02

SPADE

Automatically synthesizes data quality assertions for LLM pipelines. Catches output errors before they propagate. Deployed inside LangSmith - LangChain's pipeline hub - running across 2,000+ real-world pipelines. Published VLDB 2024.

03

EvalGen

A mixed-initiative tool that aligns LLM-generated evaluators with human preferences. Won Best Paper Honorable Mention at UIST 2024. Addresses the problem of who validates the validators - a question nobody had a good answer to before this paper.

04

AI Evals Course

Co-created with Hamel Husain on Maven. Became Maven's highest-grossing course ever. 4,500+ students from 500+ companies. Fifty or more students each from Google, Microsoft, OpenAI, Meta, Amazon. Turned AI evaluation from a vague idea into an engineering practice.

05

O'Reilly Book

"Evals for AI Engineers: Systematically Measuring and Improving AI Applications." Co-authored with Hamel Husain. Published Spring 2026. Covers error analysis, evaluation pipelines, LLM-as-a-judge. The textbook that will train the next generation of AI engineers.

06

Production ML Research

Deployed "Moving Fast With Broken Data" at Meta. Built ML observability systems. Published across VLDB, SIGMOD, UIST, CHI, CSCW - a publishing breadth that would be unusual for a seasoned professor, let alone a PhD student.

Story

From Stanford to Berkeley, Via Production Trauma

She did the standard top-tier CS path: Stanford for undergrad and master's, concentrating in systems and artificial intelligence. While there, she helped run SHE++, Stanford's nonprofit for underrepresented minorities in tech - a sign of something that shows up consistently in her work, a concern for who gets access and who gets left out.

After Stanford, she went to industry. Google Brain. Meta. The jobs that CS graduates want. She was good at them. She was also watching, up close, how production ML actually fails. The data pipelines that quietly corrupt. The models that degrade on schedules nobody tracks. The evaluation workflows that amount to vibes and hope.

This is where a lot of people write blog posts about problems and move on. Shreya went back to school. She started her PhD at UC Berkeley in 2020 and began treating those industry frustrations as research questions.

The NDSEG Fellowship followed in 2022. So did papers. A lot of papers, published at venues that do not usually appear together on a CV. The combination - deep systems work plus HCI methods, interviews and observational studies alongside formal database research - is unusual. It is also, in retrospect, exactly right for the problem space. You cannot fix production ML without understanding both the technical systems and the humans operating them.

DocETL is the culmination of this approach so far. It is not just a processing system. It is a rethinking of how humans and LLMs collaborate to extract meaning from large unstructured document collections. The companion IDE, DocWrangler, puts that capability directly in researchers' and analysts' hands. The public defenders who picked it up were not thinking about system architecture. They needed something that worked.

Research

The Papers That Moved the Field

"SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines"

VLDB 2024 - with Haotian Li, Parth Asawa, Madelon Hulsebos, Eugene Wu, Aditya Parameswaran, et al.

Deployed in LangSmith across 2,000+ pipelines

"Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences"

UIST 2024

Best Paper Honorable Mention - introduced EvalGen tool

"DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing"

VLDB 2025, SIGMOD 2026

3,500+ GitHub stars - 30+ S&P 500 companies - 2 California county public defenders

"Operationalizing Machine Learning: An Interview Study"

CSCW 2024 - with Rolando Garcia, Joseph Hellerstein, Aditya Parameswaran

18 ML engineer interviews - identified the velocity/validation/versioning triad

"Moving Fast With Broken Data"

arXiv 2023 - with Labib Fawaz, Karl Gyllstrom, Aditya Parameswaran

Deployed at Meta in production

"Towards Observability for Production Machine Learning Pipelines"

VLDB 2023 - with Aditya Parameswaran

Introduced bolt-on ML observability architecture

Timeline

How She Got Here

2015

Enrolled at Stanford University, Computer Science. Began concentrating in systems and AI.

2018

Helped run SHE++ at Stanford - a nonprofit empowering underrepresented minorities in technology.

2019

Completed B.S. and M.S. at Stanford. Worked as ML engineer at Google Brain.

2020

ML/data engineer at Meta. Started PhD at UC Berkeley EPIC Lab under Aditya Parameswaran.

2022

Awarded NDSEG Fellowship and Bridgewater AI Labs Fellowship. First publications at VLDB.

2023

Published "Operationalizing ML" and "Moving Fast With Broken Data" - the latter deployed at Meta.

2024

SPADE at VLDB, EvalGen Best Paper Honorable Mention at UIST. AI Evals course on Maven becomes platform's highest-grossing ever.

2025

Named MIT Rising Stars in EECS. DocETL at VLDB. Mailing list passes 25,000 subscribers.

2026

O'Reilly book published. Faculty candidate at Carnegie Mellon. Final PhD year. On the CS job market.

Recognition

The Wins

2025

MIT Rising Stars EECS

Named among the most promising graduate researchers in electrical engineering and computer science

NDSEG

NDSEG Fellowship 2022

National Defense Science & Engineering Graduate Fellowship - awarded to top PhD researchers

UIST

Best Paper Honorable Mention

UIST 2024 for "Who Validates the Validators?" - top HCI venue recognition

#1

Maven's Top Course

AI Evals For Engineers became the highest-grossing course in Maven platform history

O'R

O'Reilly Author

"Evals for AI Engineers" - published by O'Reilly, the technical book standard-bearer, Spring 2026

BW

Bridgewater Fellowship

Bridgewater AI Labs Research Fellowship awarded for doctoral research excellence

Shreya
Shankar

The Engineer Who Refused to Pretend Production ML Was Fine

Tools That Escaped the Lab

From Stanford to Berkeley, Via Production Trauma

The Papers That Moved the Field

How She Got Here

The Wins

The Problem Nobody Wanted to Admit Was Hard

Links & Resources

ShreyaShankar

The Engineer Who Refused to Pretend Production ML Was Fine

Tools That Escaped the Lab

From Stanford to Berkeley, Via Production Trauma

The Papers That Moved the Field

How She Got Here

The Wins

The Problem Nobody Wanted to Admit Was Hard

Links & Resources

Shreya
Shankar