The team that built the engine behind half the AI internet stopped being a research project - and became a company.
Stanford, California. Twenty-seven people. One open-source inference engine quietly moving trillions of tokens a day for names you already know. This is what "infrastructure" looks like before anyone notices it's load-bearing.
Right now, somewhere in a data center you will never visit, a request hits a large language model and comes back faster than it has any right to. Odds are good the code that made it fast is called SGLang - and as of May 2026, the people who wrote it work at a company called RadixArk.
Here is the strange part. You could have used RadixArk's work for years without ever seeing the name. SGLang is an open-source inference engine - the layer that actually serves a model once the training is done. It runs across hundreds of thousands of GPUs and generates trillions of tokens a day for Google, Microsoft, NVIDIA, Oracle, AMD, Nebius, LinkedIn, xAI and Thinking Machines Lab. It was free. It still is.
That is the setup for a business plan that sounds, at first, like a mistake: build the fastest engine in AI, give it away, and let the largest companies on earth run it for nothing. Then charge for the one thing they would rather not do themselves - keeping it running. RadixArk raised $100 million on that idea. Accel led. Spark Capital co-led. NVIDIA, AMD and MediaTek - three chipmakers who compete with each other - all wrote checks into the same round.
The company grew out of LMSYS, a non-profit research collaboration spanning Stanford, UC Berkeley, Carnegie Mellon and UC San Diego. SGLang started there in 2023 as an academic project. Five years is a long time in AI; long enough for a research artifact to quietly become infrastructure that everyone depends on and nobody owns. RadixArk is the attempt to keep it that way while still paying salaries.
Our mission is simple yet ambitious: make frontier-level AI infrastructure open and accessible to everyone.
Built inference systems for xAI's Grok models before co-creating SGLang. A researcher out of the LMSYS/Stanford orbit who now runs the company commercializing it. Her framing: a "crucible" that can repeatedly produce cutting-edge AI.
AI infrastructure and modeling veteran with roots at NVIDIA. Brings the systems and hardware perspective that keeps SGLang fast across a moving target of chips and model architectures.
Drawn from academic labs and industry - xAI, NVIDIA, and the LMSYS research community. Small enough to move fast, connected enough that their code already touches most of the AI internet.
The open-source inference engine for serving modern LLMs and VLMs at scale. Fast, flexible, and already the production choice for Google, Microsoft, Oracle and xAI. The "radix" in RadixArk nods to the radix-attention trick at the heart of its speed - reusing computation instead of repeating it.
An open-source reinforcement-learning framework for large-scale LLM and VLM post-training, forked from and co-evolving with slime. This is the "teach the model to get better over time" layer - the other half of the model lifecycle beyond serving.
The revenue engine: managed hosting, tooling and support built on top of SGLang and Miles. Developers, startups, enterprises and research labs get speed, control and performance without babysitting the infrastructure. The Databricks and Elastic playbook, pointed at AI inference.
Seed · $100,000,000 · May 2026 · $400M post-money. One of the largest AI-infrastructure seed rounds on record - and a rare instance of three competing hardware makers backing the same neutral ground.
SGLang is the absolute best inference framework for large language models.
The name nods to "radix attention" - the data structure that lets SGLang reuse computation and run fast.
Roughly 27 people, yet their code touches an estimated half of the AI internet's inference traffic.
NVIDIA, AMD and MediaTek - direct competitors - all invested in the same round.
SGLang began as a non-profit academic project, not a startup deck.
CEO Ying Sheng previously built inference for xAI's Grok models.
Return to where we started - a request hitting a model, an answer coming back fast. Nothing about that moment looks different than it did a year ago. That is the point. RadixArk's whole bet is that the layer making it fast should stay open, stay shared, and stay owned by no single company, even as it turns into a business.
The market it sits in - AI inference - has become a battleground where most players guard their advantage. RadixArk's advantage is that it already supplies most of the ammunition, and it did so in public. If it works, the data center gets cheaper and faster for everyone, and the name stays exactly as invisible as good infrastructure should be.