The data problem hiding inside every drug pipeline
Most people who spend years in pharmaceutical research and watch four drugs earn FDA approval would call that a career highlight. Abhishek Jha called it a diagnostic. What he saw at Agios Pharmaceuticals wasn't just the triumph of good science - it was how much time, money, and talent got swallowed by fragmented, inconsistent, unstructured data before the science could even begin. In 2015, he left to fix it.
The company he co-founded with Swetabh Pathak (an IIT Delhi engineer he'd worked with at Agios) and Richard Kibbey (an MD/PhD professor at Yale) is called Elucidata. Its platform, Polly, is what Jha built to answer a question that most biotech founders avoid: why does AI keep failing in life sciences when it works so well everywhere else?
His answer: the data isn't ready. Not in a simple "clean it up" sense. In a deep structural way - incompatible formats, missing metadata, context stripped out during processing, samples that don't mean what anyone thinks they mean. Polly is the infrastructure layer that sits between raw biological data and the AI models that need to learn from it.
"Life doesn't let you do a control experiment."- Abhishek Jha
Jha came to this with credentials that span three worlds. An integrated Master's in Physical Chemistry from IIT Bombay. A PhD from the University of Chicago. A postdoctoral fellowship at MIT where he built computational models of proteins and systems-level models of the immune system. That background meant he understood both the biology and the math - and could see clearly where data science tools built for web platforms were being applied, poorly, to problems that needed something different.
"The AI paradigm that worked for tech will not carry over to life sciences," he has said plainly. "It's time for data-centric AI." That framing - putting data quality and context ahead of model sophistication - became Elucidata's organizing principle.
"Scientist turned founder, CEO of Elucidata, and someone who's spent the last decade thinking about what makes biomedical R&D actually work - and what quietly holds it back."- Abhishek Jha, self-description
What Polly actually does
Polly - The AI-Ready Data Platform
Named after immunologist Polly Matzinger - famous for bucking scientific orthodoxy with her "danger model" theory - the platform ingests, curates, and harmonizes multi-omics biological data from public and proprietary sources. It makes that data FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready for machine learning pipelines in drug discovery and clinical research.
Polly processes genomics, transcriptomics, proteomics, metabolomics, and clinical data - the full multi-omics stack. It uses LLM-powered metadata extraction and enrichment, AI-assisted cohort builders, and streamlined clinical trial data management. The customer outcome Elucidata cites for academic core facilities: 80% faster data management, 2x research capacity, 50% reduction in data queries, and 60% faster onboarding of new researchers.
By 2024, Elucidata had reached $22.2M ARR, with the platform sitting inside over 40 drug programs at IND stage or later - meaning drugs that have been cleared for human trials. That's not a data management tool. That's infrastructure inside the most consequential decisions in medicine.
A mile wide and inch deep - then the honest rethink
What sets Jha apart from many founders is his willingness to say, publicly and precisely, when a strategy failed. The original Elucidata thesis was broad: take all the biomedical data out there, clean it up, make it AI-ready. Jha later described this approach candidly: "a mile wide and inch deep... it was just not giving a good enough ROI for our customers."
The pivot was surgical. Instead of trying to make all biological data AI-ready, Elucidata sharpened its focus to a specific class of failures that traditional AI models handle badly: out-of-distribution (OOD) problems. In biology, the OOD problem is everywhere - a patient sample that doesn't match training distributions, a protein folding edge case, a drug interaction pattern no one has seen before. Conventional ML breaks on these. That's where the interesting breakthroughs hide.
"There's always some unusual thing, right, which in some ways does not match a pattern."- Abhishek Jha, on the value of out-of-distribution signals in biotech
Jha has proposed what he calls T2D2 - the Turing Test for Drug Discovery - as a benchmark for when AI models in life sciences are genuinely useful versus performing well on sanitized benchmarks. The concept pushes back against the general assumption that model size and training volume solve the problem. In drug discovery, the edge cases aren't noise. They're often the signal.
In January 2026, Elucidata launched AI Labs, a dedicated unit combining scientists, ML engineers, product leads, and designers focused on OOD intelligence for biomedical AGI. Jha wasn't positioning for a trend - he'd been building toward this for a decade.
Milestones that didn't come from marketing
From IIT Bombay to biomedical AGI
Jha on science, startups, and what AI gets wrong
The AI paradigm that worked for tech will not carry over to life sciences. It's time for data-centric AI.On why life sciences needs a different AI approach
Startups are one of the best self-improvement processes.On building Elucidata
We took an approach that was a mile wide and inch deep... it was just not giving a good enough ROI for our customers.On Elucidata's strategic pivot - admitting what wasn't working
There's always some unusual thing which in some ways does not match a pattern - and that's often where the valuable signal is.On out-of-distribution intelligence in biomedical research