Building a map of cancer biology no human eye could ever read - one hundred million tumor cells at a time.
RON ALFA / NOETIK / SOUTH SAN FRANCISCO, CA
Before he ever touched a pipette, Ron Alfa won a medal for an essay about the birth of the placebo. That 2008 William Osler Medal - awarded by the American Association for the History of Medicine to a UCSD undergraduate who had not yet enrolled in medical school - was a signal flare for what was coming: a career that could not decide between the history of ideas and the biology of cells, and eventually decided not to choose. The essay was called "Redefining Inert: The Birth of the Placebo in American Medicine." The irony of its subject is not lost on anyone following Noetik's work.
Today, Alfa is Co-Founder and CEO of Noetik, a South San Francisco biotech that has assembled what it describes as one of the world's largest collections of multimodal human tumor data - and built AI foundation models on top of it. The company's OCTO-VC platform processes hundreds of millions of spatially resolved tumor cells across four data modalities: spatial transcriptomics, spatial proteomics, H&E imaging, and whole exome sequencing. In January 2026, GSK licensed those models in a five-year, $50 million+ deal - one of the first large-scale transactions to monetize a biological foundation model as a recurring enterprise software asset.
Most drug discovery is still guesswork.
- Ron AlfaThe problem Alfa is solving is not subtle. Roughly 95% of cancer drugs that enter clinical trials fail. The standard explanation pins the blame on pharmacology - bad molecules, bad targets, bad biology. Alfa disagrees. His bet is that a significant portion of those failures are patient selection errors: effective drugs tested on the wrong populations, in the wrong tumor microenvironments, against cancers that only superficially resemble each other. "Cancer might be the most misunderstood disease out there," he has said. "It's not one disease, it's a family of diseases." And the corollary: "Many of these 'failed' treatments actually work! But we're not looking at the right patients with the right tumors."
The son of immigrants from Egypt and the Palestinian Territories, born into a family of refugees from Lebanon, Alfa came to science through an unusual route. After completing a BS in Animal Physiology and Neuroscience at UCSD summa cum laude, he detoured to University College London on a Wellcome Trust fellowship, earning an MA in the History of Medicine. He then returned to California to enter Stanford's Medical Science Training Program (MSTP), emerging with both an MD and a PhD in Neuroscience. His doctoral work sat at the intersection of cellular biology and computational modeling - preparation, in retrospect, for everything that followed.
In 2009, before enrolling at Stanford, he received the Paul and Daisy Soros Fellowship for New Americans - a highly competitive award for immigrants and the children of immigrants who show exceptional academic potential. He was 22.
The six years Alfa spent at Recursion Pharmaceuticals shaped his understanding of what machine learning could and could not do for drug discovery. He joined the company at seed stage, when it was a handful of people running automated microscopy experiments in a Salt Lake City warehouse, and rose to Senior Vice President and Head of Research - eventually acting Chief Scientific Officer - as it grew into a post-IPO public company. At Recursion, he oversaw scientific organizations and portfolio strategy across rare disease, neuroscience, oncology, and immunology, advancing multiple programs from discovery through clinical development.
The experience confirmed his conviction that the limiting factor was not compute or model architecture - it was data quality and relevance. "The most important thing for any application of machine learning is the data," he told Pixel Scientia. When Recursion's approach - high-throughput cell microscopy plus deep learning - proved powerful for rare diseases but less tractable for the harder contextual questions in oncology, Alfa and co-founder Jacob Rinaldi stepped out to build something different.
Train models that can do what humans cannot do - that can understand biology we haven't discovered yet.
- Ron AlfaNoetik was incorporated in September 2023. Before the company could train its first foundation models, it spent nearly two years acquiring and curating actual human tumors - tissue samples with paired spatial transcriptomics, proteomics, H&E slides, and whole exome sequencing. The resulting dataset is described as "one of the largest collections of multimodal tumor data that exists anywhere on Earth." DCVC led a $14 million seed round to fund that acquisition phase. The approach is deliberately slow and expensive. Alfa has been explicit: there is no shortcut to the data, and the data is the moat.
In August 2024, Polaris Partners led a $40 million Series A - oversubscribed, with Khosla Ventures, Breakout Ventures, and existing backers DCVC, Zetta Venture Partners, and Catalio Capital Management all participating. Total funding to date: approximately $62 million. Amy Schulman of Polaris joined the board.
The company's current platform centers on two flagship AI systems. OCTO-VC is a virtual cell foundation model - self-supervised, trained on spatially resolved tumor data, capable of simulating gene expression, cell states, and tumor-immune interactions at single-cell resolution. TARIO-2, announced in early 2026, is an autoregressive transformer trained on one of the world's largest tumor spatial transcriptomics datasets; it can predict an approximately 19,000-gene spatial expression map from nothing more than the standard H&E pathology slide that every cancer patient already has. The practical implication: a hospital with only the most basic pathological infrastructure can access the equivalent of a full spatial genomics readout.
Alongside the models, Noetik operates a high-throughput in vivo CRISPR Perturb-Map platform - a tool for systematically knocking out genes in real tumor models and measuring the consequences in spatial context. The combination of in vivo perturbation data and foundation model-based prediction is what Alfa calls a route to "targets we couldn't have found any other way."
The GSK deal, announced January 8, 2026, is the first major validation of Noetik's licensing business model. GSK received a non-exclusive license to OCTO-VC in two cancer types - non-small cell lung cancer and colorectal cancer - for a five-year period, with annual subscription fees and $50 million in upfront capital and near-term milestones. Alfa called it "a shift for the biopharma industry." DCVC's commentary framed it as "among the first and largest transactions monetizing a biological foundation model as a scalable enterprise asset." The implication is clear: Noetik intends to be the operating system of cancer drug discovery, not just another drug developer.
In 2018, while still at Recursion, Alfa delivered a TEDMED talk titled "What if we could map all of human biology?" He described a future in which aggregating cellular data at scale would let researchers test and deliver drugs to patients in a fraction of current timelines. He has spent the following eight years attempting to build exactly that - narrowed from all of human biology to the particular and brutal territory of the tumor microenvironment. The constraint turned out to be a source of power. Cancer, unlike most diseases, accumulates its own multimodal data: tissue, genomics, pathology, treatment response. Noetik collects all of it.
What Alfa appears to believe - and what Noetik's architecture reflects - is that the revolution in cancer treatment will not come from a single breakthrough molecule. It will come from knowing, before anyone takes a pill, exactly which patient and which tumor that molecule was built for. "One of the biggest problems we can impact," he has said, "is predicting clinical success." The placebo essay, the Wellcome Trust fellowship, the history of medicine degree - none of it was a detour. It was research into how humans misread data, how they mistake correlation for mechanism, how the story we tell about a drug shapes whether we think it works. The AI is doing something similar, at a scale no human historian could ever manage.
Cancer might be the most misunderstood disease out there. It's not one disease, it's a family of diseases.
Many of these 'failed' treatments actually work! But we're not looking at the right patients with the right tumors.
The most important thing for any application of machine learning is the data.
Train models that can do what humans cannot do - that can understand biology we haven't discovered yet.
Most drug discovery is still guesswork. This is a shift for the biopharma industry.
One of the biggest problems we can impact is predicting clinical success.
Noetik's thesis: many "failures" are patient-selection failures, not pharmacology failures. The right drug is meeting the wrong tumor.
A self-supervised foundation model trained on hundreds of millions of spatially resolved human tumor cells across four data modalities. Simulates gene expression, cell states, and tumor-immune interactions at single-cell resolution. Licensed to GSK in January 2026.
An autoregressive transformer trained on one of the world's largest tumor spatial transcriptomics datasets. Predicts approximately 19,000-gene spatial expression maps from standard H&E pathology slides - the same slides already taken from every cancer patient, at every hospital, globally.
A high-throughput in vivo CRISPR system for systematically knocking out genes in real tumor models and measuring consequences in spatial context. Combines functional genomics with multimodal spatial biology to identify targets that computational models alone could not find.