Shreya Shankar is not building the next chatbot. She is building the infrastructure that lets everyone else's AI systems stop lying to them. Her work sits at the uncomfortable intersection of data systems research and human-computer interaction - a combination unusual enough that she publishes top-tier papers in both VLDB and CHI in the same calendar year.
The core insight behind everything she does: production machine learning breaks in ways that are invisible, expensive, and preventable. Bad data does not announce itself. LLM outputs cannot be trusted at face value. Pipelines that worked last Tuesday silently drift by Thursday. Most engineering teams discover these failures when customers complain. Shreya's work makes the breakage visible before that happens.
Her flagship open-source project, DocETL, processes unstructured text at scale using LLMs and has been deployed in 4,100+ pipelines across more than 30 S&P 500 companies. It has also been adopted by public defenders in two California counties - which means research born in a Berkeley PhD program is now helping people keep their freedom. That is a fairly short trip from dissertation to courtroom.
Before any of this, she spent time as an ML engineer at Google Brain and Meta. Those jobs gave her a detailed education in what actually goes wrong in production - and a specific kind of frustration. The data quality problems she kept running into had no good tools. The evaluation workflows were ad hoc. The observability was thin. She went back to school not to escape industry problems but to attack them with more serious weapons.