BREAKING: Array database eats genome for breakfast SciDB powered the NIH 1000 Genomes browser Amgen · Janssen · Alnylam · Bristol Myers Squibb on board Co-founder holds the Turing Award REVEAL turns UK Biobank into a discovery engine EST. 2010 · Waltham, Massachusetts BREAKING: Array database eats genome for breakfast SciDB powered the NIH 1000 Genomes browser Amgen · Janssen · Alnylam · Bristol Myers Squibb on board Co-founder holds the Turing Award REVEAL turns UK Biobank into a discovery engine EST. 2010 · Waltham, Massachusetts
Paradigm4 brand image
Paradigm4. The data company that decided biology was too interesting to fit in a spreadsheet.
Company · Life-Sciences Data

Paradigm4.

The scientific data platform that thinks in arrays, not rows - and helps drug hunters turn messy biology into evidence they can bet on.

EST. 2010 WALTHAM, MA REVEAL · SciDB ~28 PEOPLE
01 — The Scene

A scientist asks biology a hard question. The answer arrives before lunch.

Somewhere inside a pharmaceutical company, a researcher types a question that used to take a team and a fortnight: which genetic variants, across half a million people, track with a disease that imaging and wearables both flag? The query touches genomes, clinical records, scans, and sensor streams at once. It comes back in minutes. That quiet moment - a population-scale question answered interactively - is what Paradigm4 sells. Not a dashboard. A change in tempo.

Paradigm4 is a small, deeply technical company in Waltham, Massachusetts, that builds software for the people who hunt drug targets and chase precision medicine. Its products, REVEAL and the SciDB engine underneath it, exist to make enormous, mismatched scientific datasets behave like a single thing you can reason about. The company has maybe 28 employees. Its customer list reads like a who's who of pharma. That contrast is the whole story.

Most software wants your data to be tidy. Biology refuses. Paradigm4 took biology's side.
02 — The Problem They Saw

Spreadsheets were built for accountants. Genomes are not invoices.

For decades, the default way to store data has been the table: rows and columns, like a ledger. It is a wonderful tool for tracking transactions. It is a clumsy one for tracking a million genetic variants measured across hundreds of thousands of patients, layered with images, lab values, and the readings of a watch. Force that kind of data into rows and the math you actually want to do - the linear algebra at the heart of modern analytics - grinds.

The result, across the life sciences, was a familiar tax: scientists waiting on engineers, datasets copied and re-copied, questions narrowed to fit the tooling rather than the biology. Discovery moved at the speed of plumbing.

The bottleneck in modern biology was rarely the science. It was the file format.
The least glamorous sentence in drug discovery, and the truest.

Someone needed to model scientific data the way scientists actually think about it - as vectors and multidimensional arrays - and then make a computer chew through it at scale. That someone turned out to be an unlikely pair.

03 — The Founders' Bet

A Turing Award winner and a machine-vision pioneer placed the same wager.

The bet started in a lab. In 2008, MIT professor Michael Stonebraker - who would later win the Turing Award, computing's highest honor - began building a database that stored data in multidimensional arrays rather than rows. The early experiments confirmed something he suspected: array storage offered large efficiency advantages, and it let analytical tools built on linear algebra run on huge datasets in ways tables simply could not.

In 2010 he turned the project into a company, and brought in Marilyn Matz to run it. Matz was no first-timer. She had co-founded Cognex, an industrial machine-vision company that went public in 1989. She had already built one hard-technology business from nothing; she signed up to do it again, this time pointed at the genome.

One founder had proven the database could work. The other had proven she could turn difficult technology into a company. They were betting on the same thing from opposite ends. On Stonebraker & Matz

Their wager was specific and a little stubborn: that the array model was not a niche trick but the right foundation for an entire scientific discipline drowning in heterogeneous data. The funding that followed was modest by today's standards - a Series A in 2015 backed by Golden Seeds, Sigma Partners, Kepha Partners, MIT, and Koa Labs - and the company stayed small. The thesis, not the headcount, was the asset.

04 — The Product

SciDB does the thinking. REVEAL does the talking.

Underneath everything sits SciDB: a massively parallel, transaction-safe, array-oriented database and computational engine. Its basic unit is not the row but the vector and the multidimensional array - which means single-cell data, omics data, imaging, instrument readings, wearables, and environmental signals all slot into the same natural shape. Math reads it directly.

On top sits REVEAL, the translational informatics platform that scientists actually touch, with its PheGe Browser for exploring how genes and phenotypes line up. REVEAL is built for integrative, longitudinal, population-scale exploration - the kind of question that spans modalities and refuses to sit still.

REVEAL

The multimodal evidence platform. Connects omics, clinical, behavioral, imaging and environmental data for biomarker-guided discovery.

SciDB

The array-native engine. Massively parallel, transaction-safe, and built so linear algebra runs at population scale.

flexFS

Data management and file handling for scientific datasets that long ago stopped fitting on one machine.

Trusted Research Environment

Secure, compliant infrastructure for sensitive data and collaboration that legal teams can live with.

The trick is not storing the data. Anyone can hoard data. The trick is being able to ask it a question and mean it.
A philosophy disguised as a product spec.

The short, stubborn history

2008
The lab. Stonebraker starts building an array database at MIT and confirms the efficiency advantage.
2010
The company. Paradigm4 is founded; Marilyn Matz joins as CEO to commercialize SciDB.
2015
The fuel. Series A round closes with Golden Seeds, Sigma Partners, Kepha Partners, MIT and Koa Labs.
2018
The biobank. REVEAL precision-medicine platform launches for UK Biobank data with industry-leading adopters.
2020
The pivot in. Paradigm4 moves into single-cell data and helps pharma respond to COVID-19; MIT News profiles the company.

Fifteen years, no land-grab. Proof that you can build something serious without building something loud.

05 — The Proof

The customer list is the argument.

A small company gets the benefit of the doubt exactly once. After that, the names on the contracts do the persuading. Paradigm4's roster includes Amgen, Janssen, Alnylam Pharmaceuticals, Bristol Myers Squibb, Maze Therapeutics, and Agios. Beyond pharma, SciDB powered the NIH NCBI 1000 Genomes browser, and research at the NIH and Stanford has run on the platform. Novartis and Foundation Medicine have been among its users.

AmgenJanssenAlnylamBristol Myers SquibbMaze TherapeuticsAgiosNIH / NCBIStanfordNovartis
We've discovered insights that we wouldn't have known without this system. Paradigm4's platform is the foundation of our target discovery efforts. Paul Nioi · SVP Research, Alnylam

Why arrays, in one chart

Illustrative profile of the data Paradigm4 is built to handle

UK Biobank scale
~500k people
Data modalities
omics + imaging + clinical + wearables
Team size
~28 people
Founded
2010

The point isn't the bars. It's the mismatch: a 28-person team handling half-a-million-person datasets across four kinds of biology. That ratio only works if the data model does the heavy lifting.

06 — The Mission

Turn data into evidence - the kind you'd risk a drug program on.

Paradigm4 frames its mission plainly: turn multimodal data into evidence that reduces risk and builds conviction. In an era when every company claims an AI hypothesis, the harder job is checking whether the hypothesis is true. That requires real, integrated, multimodal evidence - genomes lined up against phenotypes, scans, and outcomes - not a confident-sounding model output.

It is, in a sense, an unfashionable mission. Paradigm4 is not promising to replace the scientist. It is promising to hand the scientist a sharper instrument and then get out of the way.

Anyone can generate a hypothesis. Paradigm4 is in the duller, more valuable business of finding out whether it holds.
07 — Why It Matters Tomorrow

The next decade of medicine runs on data that doesn't fit.

Population-scale biobanks keep growing. Single-cell datasets explode in size. Wearables stream continuously. Imaging gets richer. None of it arranges itself politely into rows, and the appetite to ask questions across all of it at once is only sharpening. The companies that can interrogate that mess interactively will find targets first. The ones still copying spreadsheets will read about it later.

Which brings us back to that researcher and the question answered before lunch. A decade ago, that moment didn't exist; the question would have been trimmed to fit the tools, the answer would have arrived a fortnight late, and three modalities would have been quietly dropped along the way. Paradigm4's contribution is unglamorous and large: it changed what counts as a reasonable question to ask. The genome stopped being too big. Biology stopped being too messy. The instrument finally caught up to the curiosity.

They didn't make biology simpler. They made it answerable. That's the whole company.

Share Paradigm4