It's 2 a.m. in Berkeley. A model spits out a string of 1,300 amino acids that has never existed. By Tuesday, that string is cutting DNA inside a human cell.
The lab where the prompt is a protein
Profluent is what happens when a former Salesforce AI researcher decides that the next "language" worth modeling isn't English. It's the four-letter alphabet of DNA and the twenty-letter alphabet of protein, and the texts to learn from are 115 billion sequences accumulated by life over four billion years of trial and error.
The company sits in the unglamorous middle of two industries that rarely talk to each other. One half is wet lab: pipettes, sequencers, freezers. The other half is GPU clusters running transformer models on the world's largest protein corpus. Profluent's central wager is that those two halves, if wired together correctly, will out-design evolution.
Why biology stayed analog while everything else went digital
For decades, drug discovery has been a search problem dressed up as a science. A team picks a target, screens libraries, runs assays, repeats. A successful new antibody can take five years and a few hundred million dollars to find. A new enzyme for industrial use is often borrowed - sometimes politely, sometimes not - from a bacterium someone scraped out of a hot spring.
The status quo has worked, in the way that horse-drawn carriages worked. It also leaves entire categories of medicine off the table because the chemistry just isn't there yet. CRISPR, the gene editing tool that won a Nobel in 2020, is still mostly the same handful of natural Cas proteins it was in 2012. They're large, finicky about where they cut, and the patent landscape around them is, charitably, a swamp.
Profluent's founders looked at this and asked an uncomfortable question. If transformer models can write working code from a comment, why can't they write working proteins from a specification?
Scaling laws, but for ribosomes
Ali Madani spent years at Salesforce Research training what would become ProGen - one of the first protein language models proven, in Nature Biotechnology in 2023, to generate brand-new proteins that actually folded and functioned. He left to found Profluent in 2022 with Alexander Meeske, a microbiologist whose lab work on CRISPR systems gave the new company an instant bench-to-byte loop.
The bet was philosophical as much as technical: that the now-familiar story of large language models - bigger model plus more data equals predictable, measurable gains - would replay itself in biology. At the time this was not the consensus view. Polite biologists called it premature. Less polite ones called it nonsense.
It turned out to be neither. In April 2025 Profluent published evidence that protein generation follows scaling laws of its own. Doubling parameters and training data doesn't just produce more output. It produces better output, in ways the team can chart on a graph and put in a slide deck.
Four years, one wager, several enzymes
One platform, several flavors of useful
Profluent doesn't sell a single product so much as a stack. At the bottom: the Protein Atlas, an internal dataset of more than 115 billion protein sequences. In the middle: a family of foundation models - ProGen3, proseLM, Protein2PAM - that can be prompted with structure, function, target-site constraints, or all three at once. At the top: a wet lab that takes the model's best guesses and tells the model when it's wrong.
OpenCRISPR-1
The world's first AI-designed gene editor, free for any researcher or company to use. A polite middle finger to the CRISPR IP morass.
ProGen3
Latest frontier protein language model. Used to design compact gene editors that are smaller and easier to deliver than Cas9.
Protein Atlas
115B+ sequences in a single training corpus - the largest known to the public.
proseLM & Protein2PAM
Lets the model design proteins under explicit constraints - target a particular DNA motif, fold to a specific shape, hit a function spec.
Partner Programs
B2B engagements with pharma and biotech on bespoke antibodies, enzymes and editors.
The capital that bought a lot of GPUs
Source: Profluent / BusinessWire / Crunchbase. Heights scaled to round size, not valuation, which Profluent doesn't disclose.
When the model is right, the cells listen
The skeptical reader is allowed a moment of doubt here. AI-designed proteins are a great press release. Working AI-designed proteins are a different category of object. Profluent's case rests on a few public data points that are hard to wave away.
OpenCRISPR-1 is the headline. The sequence diverges from natural SpCas9 by more than 400 amino acid substitutions - effectively a new enzyme nature did not author. It nonetheless cuts DNA in human cells with efficiency comparable to the canonical version. The team released the sequence in April 2024 under terms that allow commercial use, which is the protein-design equivalent of dropping a mixtape.
Beyond the editor: Eli Lilly's 2025 deal for AI-designed recombinases is a vote of confidence with money attached. Recombinases are the next frontier in gene editing because they can swap large stretches of DNA without breaking the double helix. Designing a good one has historically been a multi-year academic project. Lilly is paying Profluent to do it on a deadline.
Open-source where it counts, paid where it scales
Profluent could have kept OpenCRISPR-1 under lock and key. It would have been the obvious move. Instead, the company released the sequence, gave it a permissive use policy, and made the bet that an ecosystem of academic and small-biotech users beats a moat of patent attorneys.
The mission, as the company puts it, is to make biology programmable. Less elegantly: to compress the cycle time of inventing a new therapy from years to weeks, and to expand the menu of things biology can be asked to do. That includes industrial enzymes for greener manufacturing, antibodies for diseases that don't have great ones yet, and genome edits for conditions today's CRISPR can't reach.
Three things that might amuse you
- OpenCRISPR-1 is more than 400 mutations away from any natural Cas9. By evolutionary distance, it is roughly to SpCas9 what a cat is to a wolf.
- Profluent's Protein Atlas - 115B sequences - is, by sequence count, larger than every protein database NCBI maintains, combined.
- The founding bet, that scaling laws apply to biology, was floated before GPT-4 had been released. It now looks less like a bet and more like a thesis.
Back to the 2 a.m. printout
If Profluent is right - and a growing list of partners and reviewers thinks it might be - the printout in that opening scene stops being a curiosity. It becomes the new normal. Every editor, every antibody, every industrial enzyme gets a model-generated draft before any human pipette touches a tube. The wet lab becomes the place where you verify a hypothesis, not the place where you stumble onto one.
That is a smaller change than self-driving cars and a larger change than self-driving cars. Smaller, because most people will never see it happen. Larger, because the things it makes possible - cures, materials, food, fuel - touch parts of life that the digital revolution mostly skipped.
Back in Berkeley, the printout is still running. It will keep running. The interesting question is no longer whether AI can write a working protein. It is which proteins we should ask it to write next.