Breaking
$150M SEED Inferact emerges from stealth at an $800M valuation vLLM 83,000+ GitHub stars and counting BACKERS a16z and Lightspeed co-lead the round SCALE Powers inference at Amazon-grade workloads ROOTS Born at UC Berkeley's Sky Computing Lab 2024 Named a Sequoia Open Source Fellow
Inferact / vLLM / Berkeley

Simon Mo

He gives the software away for free. Then he raised $150 million to make it run faster.

Simon Mo
Lead maintainer. Reluctant CEO. Permanent optimizer.

The most important software in AI is the part you never see.

Training gets the headlines. Inference pays the bills - every time a model answers a question, somewhere a GPU is doing the work, and somebody is footing the meter.

Simon Mo builds the meter-shrinking machinery. As lead maintainer of vLLM, the open-source inference engine born in a Berkeley lab, he spends his days on the unglamorous problem of memory management - how to pack more answers into the same silicon. The trick has a name, PagedAttention, and a payoff: large language models that run cheaper and faster without anyone rewriting their model.

In November 2025 he and his fellow vLLM maintainers turned the project's gravity into a company, Inferact. Two months later it stepped out of stealth with a $150 million seed round at an $800 million valuation, co-led by Andreessen Horowitz and Lightspeed. The product they are commercializing is the same one they keep giving away.

What he optimizes

// the economics of running a model
Throughput
higher
Memory waste
lower
Cost / token
lower
Community reach
global

Illustrative - the direction of vLLM's design goals, not benchmarked figures.

$150M
Seed round
$800M
Valuation
83K+
vLLM GitHub stars
~8 yrs
Serving systems
vLLM is used by Amazon's cloud service and the shopping app.
- Simon Mo, on who runs the engine

Eight years in the serving-systems trenches

Mo did not parachute into the inference boom. He grew up in it. Before vLLM there was Ray Serve, which he helped build from zero to one at Anyscale - the model-serving layer on top of the Ray distributed framework. Before that, he was a student researcher at UC Berkeley's RISELab, asking the question he is still asking: how do you make machine-learning serving more efficient, more ergonomic, more scalable?

His fingerprints sit across the open-source map - Ray, Modin, Clipper - and his production scars run through GPU inference work and Kubernetes multi-tenancy. The throughline is not a product. It is a problem: serving is hard, and almost nobody wants to do the plumbing. Mo wanted to do the plumbing.

At PyCon 2021 he stood up to talk about "Patterns of ML Models in Production." Four years later he was delivering a keynote at the PyTorch Conference on vLLM. Same lane, bigger stage.

Then

Ray Serve

Built from scratch at Anyscale - the serving layer that taught him how production models actually break.

Now

vLLM

Lead maintainer of the inference engine, now stewarded by the PyTorch Foundation.

Roots

RISELab / Sky

Berkeley research on ML systems and cloud infrastructure - the lab lineage behind Spark and Ray.

Title

CEO, Inferact

Co-founder of the company commercializing vLLM, while the project stays open.

From a small meetup to a $800M valuation

2021
Software engineer at Anyscale, building Ray Serve. Speaks at PyCon on patterns of ML in production.
2023
Co-creates vLLM at Berkeley's Sky Computing Lab and hosts the project's first community meetup with a handful of labmates.
2024
vLLM is named one of the first Sequoia Open Source Fellows.
2025
Co-founds Inferact in November with his vLLM co-maintainers. Keynotes the PyTorch Conference.
2026
Inferact emerges from stealth: $150M seed at an $800M valuation, co-led by a16z and Lightspeed.

Computer science, math, and philosophy

An odd transcript for a man who counts GPU memory pages for a living.

At Berkeley he studied all three. The philosophy part is the tell. Inference is a question of how to allocate scarce, expensive resources fairly and fast - which is, if you squint, an ethics problem dressed up in CUDA.

Today he is a PhD student in Berkeley's Sky Computing Lab, advised by Joseph Gonzalez and Ion Stoica - the professor whose lab spun out Databricks, Anyscale, and Spark. Mo is, in the most literal sense, building the company while finishing the degree. The startup did not pull him out of research. It walked out of the research with him.

FACT 01

His GitHub badges include "Galaxy Brain" and "Arctic Code Vault Contributor."

FACT 02

vLLM's "v" nods to virtual memory - the PagedAttention trick that made the project famous.

FACT 03

His advisor, Ion Stoica, is the same researcher behind Databricks and Anyscale.

FACT 04

Inferact kept vLLM open - the project is now governed under the PyTorch Foundation.

Give it away. Then build the business around the gift.

The conventional startup hides the crown jewels. Mo's company runs the opposite play: keep vLLM free, open, and community-governed, then sell the speed, reliability, and operational muscle that hyperscalers need to run it at scale. The funding isn't a pivot away from open source. It is a wager that the open project and the company make each other stronger.

Where to find Simon Mo