Breaking: Elucidata launches AI Labs to tackle biomedical AGI 2.5M+ biomolecular datasets curated on Polly Fast Company names Elucidata #8 in Biotech, 2024 Pfizer x Elucidata reveal novel metabolic networks in T cells $23.5M total funding - Series A led by Eight Roads Ventures 115 TB of ML-ready biomedical data live on Polly Winner: National Cancer Institute AI-Readiness Challenge Breaking: Elucidata launches AI Labs to tackle biomedical AGI 2.5M+ biomolecular datasets curated on Polly Fast Company names Elucidata #8 in Biotech, 2024 Pfizer x Elucidata reveal novel metabolic networks in T cells $23.5M total funding - Series A led by Eight Roads Ventures 115 TB of ML-ready biomedical data live on Polly Winner: National Cancer Institute AI-Readiness Challenge
YesPress · Company Profile

Elucidata cleans biology's mess.

A San Francisco TechBio company turning the dirtiest data in pharma into the fuel that runs modern AI - and quietly powering drug programs at Pfizer, Genentech and a hundred others.

Elucidata - data-centric platform for life sciences
EXHIBIT A · The benzene-ring logo. A chemistry pun that quietly became a brand.
The opening scene

A Tuesday in a pharma data lake

It is a Tuesday morning in a pharma data lake somewhere outside Boston. Twelve terabytes of gene-expression files sit in folders no one has opened since 2019. A machine-learning team is two months behind because half the metadata is in German and half is in Excel. Then, somebody on the Slack channel types one word: "Polly?" Within the hour, a curated, harmonized, model-ready dataset shows up in an S3 bucket. The team meets its sprint goal. The drug program does not slip. That, more or less, is the entire business model of Elucidata.

Elucidata sells the boring part of the AI revolution. While larger names fight over foundation models and benchmark leaderboards, this 170-person company has spent ten quiet years building the equivalent of a municipal water system for biomedical data. The pipes are not glamorous. They are, however, load-bearing.

Biology is the most data-rich science we have, and the worst at putting its data away. - Caption found taped to an Elucidata whiteboard, c. 2019
The problem they saw

Drug discovery is mostly janitoring

Every pharmaceutical company on earth runs into the same wall. The science is genuinely hard. The data infrastructure is genuinely worse. A 2020 industry survey suggested that scientists spend something like 60 to 80 percent of their time finding, cleaning and re-formatting data before they can do anything useful with it. Generative AI did not fix this. It made it loud.

Co-founders Abhishek Jha and Swetabh Pathak noticed the gap in 2015. Jha had just finished a postdoc at MIT and a stint as a senior scientist at Agios Pharmaceuticals. Pathak came out of IIT Delhi with a mathematics-and-computer-science background. They began with a question that, in retrospect, sounds almost banal - what if you treated biology's data quality problem like a software problem.

Almost nobody had. Bio was full of brilliant tooling for analysis and almost nothing for curation. The plumbing was missing.

Everyone wanted an AI model. Almost no one wanted to wash the dishes that fed it. - The TechBio thesis in one sentence
The founders' bet

Build the dishwashers

Jha and Pathak's bet was unfashionable in the way only good bets are. The idea was that a horizontal platform for cleaning, harmonizing and indexing biomedical data could become more valuable than any single drug it helped discover. Not because the platform itself was sexy, but because the alternative - every pharma company hiring its own private army of bioinformaticians - was both expensive and slow.

They called the platform Polly, after the parrot, because the system's job was to learn biological patterns and repeat them back, cleanly. Then they did the unglamorous work of getting Polly to read scientific literature, parse messy lab files, reconcile differing schemas, and spit out machine-learning-ready datasets across genomics, transcriptomics, proteomics, metabolomics and clinical records.

It took years. The first customers were small biotechs. The first big logos were Pfizer and Janssen. By 2022, the company had grown enough to attract a $16 million Series A led by Eight Roads Ventures, with F-Prime Capital, IvyCap and Hyperplane joining in. Total funding to date sits at about $23.5 million - modest by frothy AI-era standards, ample for a company that has chosen to grow on revenue more than checks.

The unsexy idea, well executed, is almost always the better business than the sexy idea, poorly executed. - A founder's lesson, learned in slow motion
A short company timeline

From benzene ring to biomedical AGI

2015

Elucidata founded in Delhi by Abhishek Jha and Swetabh Pathak.

2018

Polly launches commercially. First biopharma contracts.

2019

YourStory Tech30 list. First marquee customer: Pfizer.

2021

Wins the NCI AI-Readiness Challenge.

2022

$16M Series A. Eight Roads Ventures leads.

2024

#8 Biotech on Fast Company's Most Innovative Companies.

2026

Launches Elucidata AI Labs with hubs in SF, Boston, India.

The product, briefly

Polly, the parrot that reads PubMed

Polly is a data-centric ML-Ops platform. Translated from the marketing dialect, it is three things stacked. A data lake of more than 115 terabytes of curated biomedical content, drawn from 30-plus public and proprietary sources. A Bio-NLP engine that reads scientific papers and structured files and turns them into something a model can train on. And a workbench where pharma data scientists actually run their experiments.

PRODUCT

Polly

The data-centric platform. Ingests, harmonizes, curates. Used across genomics, transcriptomics, proteomics, metabolomics and clinical records.

SDK

Polly Python

The open-source library that lets data teams pull curated datasets into their notebooks the way they pull Pandas.

SERVICE

Data Concierge

The managed-service offering. Elucidata scientists build bespoke datasets for a specific drug program, on top of Polly.

NEW

AI Labs

Launched in 2026. Combines agentic AI with the company's curated data layer to pursue "biomedical AGI". A big phrase. They are trying it anyway.

Polly is, depending on the day, a parrot, a janitor, a librarian and a translator. It is all of those because biology is. - The platform, accurately described
The proof, in numbers

A scoreboard, not a sales deck

Elucidata is the kind of company whose customer list does most of the persuading. Pfizer. Genentech. Janssen. Alnylam. Eli Lilly. The Bill & Melinda Gates Foundation. Stanford. The platform supports 40-plus drug programs at or beyond the IND stage, which is the point at which a drug stops being theoretical and starts costing real money.

Elucidata, by the unglamorous numbers

Sources: company materials, PR Newswire, Crunchbase
Datasets curated
2.5M+
ML-ready data
115 TB
Pharma customers
100+
IND-stage programs
40+
Public sources
30+
Employees
~170
Total funding
$23.5M

CHART · Bars scaled visually; numbers are absolute. Yes, datasets are several orders of magnitude bigger than headcount - which is the whole point.

The Pfizer collaboration is, by Elucidata's own accounting, one of the cleaner showcases. The two organizations used Polly's integrated-omics pipeline to study metabolic changes during T-cell activation - a corner of immunology so technical that the case study practically requires a glossary. Pfizer kept the IP. Elucidata kept the credibility.

2015Founded
170Employees
$23.5MTotal funding
115TBCurated data
#8Fast Co. Biotech '24
The mission

Make biology run on harmonized data

The official mission statement is the kind of thing companies write on a wall. Elucidata wants to make biomedical data AI-ready so that drug discovery moves at the speed of computing rather than the speed of curation. The unofficial version is shorter. Make the data not embarrassing.

Culturally, the company sits in an unusual place. Scientists and engineers in roughly equal measure. Customers who are themselves PhDs. A San Francisco headquarters on Market Street, a Boston outpost, and a center of gravity in Delhi, where much of the engineering team still lives. The blend shows up in how the product gets built - it is one of the few platforms where a benchmark conversation can swing from "what's the F1 on entity extraction" to "what is actually happening in the cell" in the same standup.

If your data is wrong, your AI is wrong, your drug is wrong, your patient is wrong. The chain starts with the schema. - A truism, somewhat ungentle
Why it matters tomorrow

The boring layer that makes the exciting layer possible

The next decade of biomedicine will be shaped less by the model architecture race and more by the question of whose data actually trains anything useful. Most clinical and omics data still lives in proprietary silos, with idiosyncratic schemas, varying provenance, and no shared vocabulary. The labs that solve that problem will not get the Nobel. They will get the contracts.

Elucidata has chosen to be that lab. Whether the AI Labs experiment with agentic biomedical AI pans out is genuinely uncertain - "biomedical AGI" is a phrase that demands skepticism. But the underlying business is plain. Curate the data. Sell the access. Let the rest of the industry build on top.

It is now Tuesday morning, again, in a different pharma data lake. The same twelve terabytes of expression files exist somewhere on AWS. This time, a junior scientist does not have to know German or remember Excel. She runs a query against Polly. The dataset arrives clean. The model trains. The slide deck is ready by Thursday. The drug program does not slip. That is what a quiet revolution looks like before anyone calls it one.

Elucidata is what happens when you decide the most important problem in biotech is the one nobody wants on their resume. - Closing scene
Watch

Interviews & product demos

YouTube · Channel Elucidata on YouTube

Demos, webinars and customer talks from the team.

Case Study Pfizer x Elucidata: T-cell metabolism

How the joint program used integrated omics to find new networks.

Product Polly platform tour

A walkthrough of the data-centric ML-Ops workbench.

Pass this around

Tell your favorite skeptic

Share on LinkedIn Share on X Share on Facebook Instagram