In May 2026, Zyphra did something the rest of the industry had quietly assumed was off the table. It trained a frontier-class reasoning model without touching a single Nvidia GPU. ZAYA1-8B - an 8-billion-parameter mixture-of-experts model with fewer than a billion parameters active at any moment - came off a cluster of AMD Instinct MI300X chips and proceeded to trade blows with models ten times its size on math, code, and reasoning.
That is the Zyphra posture in one sentence: do more with less, and do it in the open. The company is not chasing the largest model. It is chasing the densest one - the most intelligence per parameter, the most capability per watt, the most of everything that does not require a data center the size of a town.
The dominant story of modern AI is a story about size. Bigger models, bigger clusters, bigger bills. The capability lives in a handful of labs, runs on a handful of vendors' chips, and reaches the rest of us through a metered API. Convenient, certainly. Open, not exactly.
Zyphra's founders saw a different bottleneck. The constraint was not intelligence. It was access - to weights you can read, to datasets you can audit, to hardware you are not locked into, and to deployment that does not phone home. An organization that wants to own its AI, run it on its own terms, and understand what it is doing has, for most of the boom, been politely told to rent instead.
So the company set itself an awkward, expensive goal: prove that frontier-grade AI can be small, open, and cheap to run - and that you can build it without the industry's default chip. Awkward goals tend to make the best companies.
Zyphra was founded in 2020 by Krithik Puthalath, Beren Millidge, Tomas Figliolia, and Danny Martinelli. Millidge, the chief scientist, did postdoctoral research at Oxford and had a hand in early AI-safety work before turning to architecture. Puthalath runs the company as CEO. Figliolia leads model architecture. Their shared bet was that the transformer was not the last word.
The wager paid out where it counts - in conviction from people who have seen this movie before. The Series A was led by Jaan Tallinn, an early backer of both DeepMind and Anthropic, with AMD, Gaia, and AlphaJWC Ventures following. A hundred million dollars, a billion-dollar valuation, and a first institutional round that minted a unicorn.
Zyphra is less a model and more a family of them, sitting on top of an inference cloud. The research wing chases efficiency; the cloud wing turns it into something a company can deploy. The through-line is the same in both: state-space architectures, open licenses, and a stubborn refusal to assume the biggest model is the best one.
An SSM-hybrid family pairing Mamba state-space blocks with a global shared attention layer. Low inference cost, built to run on more devices. Vision-language variants too.
A mixture-of-experts reasoning family trained on AMD MI300X. Fewer than a billion active parameters, frontier-class math and code. Plus a diffusion-converted preview.
Expressive text-to-speech with high-fidelity voice cloning, shipped under Apache 2.0. Clone a voice and read the license that lets you.
A multiplayer general-agent assistant bringing long-horizon agents, search, and productivity tooling to enterprise teams.
A full-stack AI platform built on AMD - for developers, enterprises, and hyperscalers. Data sovereignty, customization, no vendor lock-in.
Open, large-scale pretraining datasets used across the ML community, with billions of cumulative downloads.
Four founders set out to prove the transformer is not the only path to capable AI.
An SSM-hybrid foundation model and a widely adopted open dataset put Zyphra on the research map.
Expressive text-to-speech with voice cloning, released fully open.
Led by Jaan Tallinn; AMD, Gaia, and AlphaJWC Ventures join. A unicorn in one round.
A reasoning MoE built without Nvidia, plus a diffusion preview with up to 7.7x speedup.
Zyphra's whole thesis rests on a ratio - capability divided by size. The clearest way to see it is to line up active parameters. ZAYA1-8B keeps fewer than a billion live at inference time while reaching for results that usually require far more.
Most labs hedge. Zyphra committed. ZAYA1 was trained end to end on AMD Instinct MI300X GPUs with AMD Pensando Pollara networking, on infrastructure provided by IBM Cloud. AMD is also an investor. It is a tidy alignment of incentives: a chipmaker that wants to prove its hardware can train frontier models, and a lab that wants to prove you do not need the default vendor to do it.
MI300X clusters and Pollara networking power ZAYA1; Zyphra Cloud is built full-stack on AMD.
Hosted the large-scale AMD training runs behind Zyphra's models.
Zyphra's stated mission is to democratize AI - open models, open datasets, and a cloud organizations can run on their own terms. It rallies around four principles: transparency in reasoning, data sovereignty, domain customization, and distributed deployment without lock-in. The slogan is shorter than the principles, and it is on the homepage.
Here is the stakes-raising part. If a sub-billion-active-parameter model can do the work of a seventy-billion one, the economics of AI stop favoring only the companies that can afford the largest clusters. Capability moves closer to the edge - onto laptops, into private data centers, behind firewalls that never open. The metered API stops being the only door.
And if all of that can be trained on hardware that is not the industry default, then the supply chain itself loosens. That is the quiet radicalism of Zyphra's work: not a louder model, but a cheaper, more portable, more inspectable one. Skeptics should note the model has shipped, the weights are downloadable, and the dataset has billions of downloads. This is not a manifesto. It is on Hugging Face.