Simon Mo

The Dispatch

The most important software in AI is the part you never see.

Training gets the headlines. Inference pays the bills - every time a model answers a question, somewhere a GPU is doing the work, and somebody is footing the meter.

Simon Mo builds the meter-shrinking machinery. As lead maintainer of vLLM, the open-source inference engine born in a Berkeley lab, he spends his days on the unglamorous problem of memory management - how to pack more answers into the same silicon. The trick has a name, PagedAttention, and a payoff: large language models that run cheaper and faster without anyone rewriting their model.

In November 2025 he and his fellow vLLM maintainers turned the project's gravity into a company, Inferact. Two months later it stepped out of stealth with a $150 million seed round at an $800 million valuation, co-led by Andreessen Horowitz and Lightspeed. The product they are commercializing is the same one they keep giving away.

What he optimizes

// the economics of running a model

Throughput

higher

Memory waste

lower

Cost / token

lower

Community reach

global

Illustrative - the direction of vLLM's design goals, not benchmarked figures.

$150M

Seed round

$800M

Valuation

83K+

vLLM GitHub stars

~8 yrs

Serving systems

The Long Game

Eight years in the serving-systems trenches

Mo did not parachute into the inference boom. He grew up in it. Before vLLM there was Ray Serve, which he helped build from zero to one at Anyscale - the model-serving layer on top of the Ray distributed framework. Before that, he was a student researcher at UC Berkeley's RISELab, asking the question he is still asking: how do you make machine-learning serving more efficient, more ergonomic, more scalable?

His fingerprints sit across the open-source map - Ray, Modin, Clipper - and his production scars run through GPU inference work and Kubernetes multi-tenancy. The throughline is not a product. It is a problem: serving is hard, and almost nobody wants to do the plumbing. Mo wanted to do the plumbing.

At PyCon 2021 he stood up to talk about "Patterns of ML Models in Production." Four years later he was delivering a keynote at the PyTorch Conference on vLLM. Same lane, bigger stage.

Then

Ray Serve

Built from scratch at Anyscale - the serving layer that taught him how production models actually break.

Now

vLLM

Lead maintainer of the inference engine, now stewarded by the PyTorch Foundation.

Roots

RISELab / Sky

Berkeley research on ML systems and cloud infrastructure - the lab lineage behind Spark and Ray.

Title

CEO, Inferact

Co-founder of the company commercializing vLLM, while the project stays open.

The Timeline

From a small meetup to a $800M valuation

2021

Software engineer at Anyscale, building Ray Serve. Speaks at PyCon on patterns of ML in production.

2023

Co-creates vLLM at Berkeley's Sky Computing Lab and hosts the project's first community meetup with a handful of labmates.

2024

vLLM is named one of the first Sequoia Open Source Fellows.

2025

Co-founds Inferact in November with his vLLM co-maintainers. Keynotes the PyTorch Conference.

2026

Inferact emerges from stealth: $150M seed at an $800M valuation, co-led by a16z and Lightspeed.

The Backstory

Computer science, math, and philosophy

An odd transcript for a man who counts GPU memory pages for a living.

At Berkeley he studied all three. The philosophy part is the tell. Inference is a question of how to allocate scarce, expensive resources fairly and fast - which is, if you squint, an ethics problem dressed up in CUDA.

Today he is a PhD student in Berkeley's Sky Computing Lab, advised by Joseph Gonzalez and Ion Stoica - the professor whose lab spun out Databricks, Anyscale, and Spark. Mo is, in the most literal sense, building the company while finishing the degree. The startup did not pull him out of research. It walked out of the research with him.

FACT 01

His GitHub badges include "Galaxy Brain" and "Arctic Code Vault Contributor."

FACT 02

vLLM's "v" nods to virtual memory - the PagedAttention trick that made the project famous.

FACT 03

His advisor, Ion Stoica, is the same researcher behind Databricks and Anyscale.

FACT 04

Inferact kept vLLM open - the project is now governed under the PyTorch Foundation.

The Bet

Give it away. Then build the business around the gift.

The conventional startup hides the crown jewels. Mo's company runs the opposite play: keep vLLM free, open, and community-governed, then sell the speed, reliability, and operational muscle that hyperscalers need to run it at scale. The funding isn't a pivot away from open source. It is a wager that the open project and the company make each other stronger.

Inferact - founded by the people who maintain the code

The Rolodex

Where to find Simon Mo

◆ inferact.ai ☉ GitHub / simon-mo 𝕏 X / @simon_mo_ ■ LinkedIn △ vLLM @ Berkeley Sky ✎ TechCrunch: the $150M round

Pass it on