A drug discovery company that treats biology as a data problem - and then refuses to apologize for it.
Walk into insitro's South San Francisco building today and you can almost trip over the joke. On one side: robotic arms moving plates of human cells under a rhythm only they understand. On the other: an open floor of machine-learning engineers staring at loss curves that, somewhere underneath the math, are about a person with ALS. There are no whiteboards covered in molecules drawn by hand. The molecules are in a database. The database is the point.
insitro is a drug discovery and development company - that part is conventional. What's unusual is everything else. The company generates its own biology at scale, in-house, on equipment that doesn't care whether it's Tuesday. It then feeds that biology, along with human genetic and clinical data, into models that look for the part most drug developers miss: a real, causal link between a gene and a disease. The output isn't a paper. It's a target. Sometimes it's a partnered program. Eventually, the bet is, it's a medicine.
Ninety percent of drug candidates that enter clinical trials never reach a patient. This is not new information. Pharma has known it, accepted it, and priced it into the cost of a new medicine for decades. The standard explanation is that biology is hard. The honest explanation is that the way targets get chosen is, mostly, an educated guess about a system nobody has fully observed.
The model has been: a researcher publishes a paper that links a gene to a disease, a pharma team finds the paper compelling, and a billion dollars later, the molecule fails because the gene wasn't actually causal, or the patient population wasn't actually the right one, or the cell line wasn't actually a good stand-in for a human. The literature has well-known reproducibility issues. The clinic, predictably, inherits them.
insitro looked at that pipeline and noticed something that, in fairness, others had noticed too: the inputs were bad. Not because the scientists were bad - they weren't - but because the data they were working with was small, noisy, and built one paper at a time. You cannot machine-learn your way out of that. You have to generate the data differently.
Daphne Koller spent eighteen years on the Stanford computer science faculty, won a MacArthur "genius" grant, then co-founded Coursera and helped it reach over a hundred million learners. By any reasonable accounting, she had already done the thing. The next thing did not have to be hard.
She picked the hard thing. In 2018, Koller founded insitro on a premise that, if you said it at a biotech cocktail party in 2017, would have gotten you a polite smile and a quick exit. The premise: that the gating problem in drug discovery is not chemistry, and not clinical trial design - it's data. Specifically, the lack of biological data generated under conditions consistent enough for modern machine learning to do useful work on it.
Investors did not need a long pitch. ARCH, a16z, Foresite, Third Rock, and GV led the seed. Two years later T. Rowe Price came in. In 2021, Canada Pension Plan Investment Board led a $400M Series C. Total raised: about $643M, plus more than $100M in collaboration revenue along the way. None of that, on its own, proves the thesis. It does prove that a lot of people with capital have decided the thesis is at least worth a real test.
The interesting story in techbio is rarely a single announcement. It is the cadence of small, expensive decisions that compound. Here is insitro's, abbreviated.
insitro's central piece of technology is called Virtual Human. It is, in the company's words, a genetically anchored causal AI engine. In plainer English: it ingests cellular data (a lot of it, generated inside the building) and clinical data (genetic, imaging, electronic health records) and tries to find the points where a real, intervenable mechanism connects a gene to a disease. The output is a ranked list of potential drug targets, with confidence levels attached and, importantly, an idea of which patients those drugs would actually help.
Sitting next to it is TherML, the platform that takes those targets and helps design the molecule. The two systems share infrastructure and, more importantly, share a feedback loop: the wet lab generates data that retrains the models, the models propose new experiments, the lab runs them, and the cycle, theoretically, gets tighter every quarter. That is the part that is hard to copy. The chemistry is not the moat. The cycle is.
Causal AI for target discovery. Genetics + cell models + clinical data. The thing that points at what to drug.
The molecule-design partner. Takes targets from Virtual Human and helps design the medicine.
Robotic cell biology at scale. Same experiment, repeated identically, ten thousand times.
Internal and partnered programs in ALS, metabolic disease (incl. NASH/MASH), and oncology.
The honest version of any techbio profile is: we will not know if any of this worked until a medicine works. insitro has not yet shipped an approved drug. Neither has any other AI-native biotech. The interesting question, in the meantime, is whether sophisticated buyers - the people whose job is to be skeptical of biology hype - are willing to put real money on the platform.
That question has a partial answer. Bristol Myers Squibb extended its ALS collaboration with insitro in 2026, nominating two more targets that came out of Virtual Human. Gilead Sciences hit its first operational milestone in the NASH collaboration. Eli Lilly signed on for metabolism work. Pharma is famously slow to pay milestones it does not have to pay. That insitro is collecting them is, if not proof, at least signal.
If you ask insitro what it is trying to do, the answer is some version of that sentence. It is not a flashy mission statement, which is part of the appeal. The phrase "better medicines" is doing a lot of quiet work: it means therapies that work in the patients who actually need them, not therapies that work on average across a heterogeneous population while quietly failing the people in the middle.
This is where the company's choice of indications starts to look less like a list and more like a philosophy. ALS - a disease with grim odds and very little working pharmacology. NASH and metabolic disease - enormous patient populations where the biology is tangled in a hundred ways. Oncology - the place where stratifying patients by molecular subtype already pays off. These are not easy diseases. They are diseases where the standard playbook has run out of ideas, which is why a different playbook might actually be tolerated.
There are two ways to evaluate a techbio company in 2026. The first is to ask whether it has shipped a medicine, which is a fair but premature question for almost everyone in the category, insitro included. The second is to ask whether the work it is doing now would, if successful, change the shape of the industry. By that second standard, insitro is doing something worth watching.
The promise is not that AI will discover drugs while humans sleep. The promise is humbler and more interesting: that a company can run the loop of biology and machine learning tightly enough, on consistent enough data, that each new target it nominates is meaningfully better than the last. If that compounding effect is real, the cost curve of drug discovery starts to bend. If it isn't, the field will have learned something equally valuable, which is where the limits actually are.
Back to the lab in South San Francisco. The robots are still moving plates. The engineers are still staring at loss curves. The difference, six years in, is that the loss curves are now attached to real targets, real pharma partners, and real patients waiting downstream. Whether or not insitro is the company that ships the first AI-discovered medicine, it is one of the small number that is genuinely trying. The rest of the industry, including the parts that used to roll their eyes, is paying attention.