Breaking
Inferact raises $150M seed at $800M valuation a16z & Lightspeed co-lead the round vLLM runs on 400,000+ GPUs worldwide Used in production by Meta, Google, Character.ai & Amazon 2,000+ contributors, 50+ core developers The code stays open source Inferact raises $150M seed at $800M valuation a16z & Lightspeed co-lead the round vLLM runs on 400,000+ GPUs worldwide Used in production by Meta, Google, Character.ai & Amazon 2,000+ contributors, 50+ core developers The code stays open source
Company ProfileAI Infrastructure · San Francisco

Inferact

The people who built vLLM - the open-source engine already serving a slice of the world's AI - turned it into a company. Then two rival venture firms co-led the round.

FOUNDED 2025 · SEED $150M · VALUATION $800M · ~22 PEOPLE

Inferact logo and wordmark
The wordmark. Four Berkeley researchers, one open-source engine, and a company that was already running on hundreds of thousands of GPUs before it had a name. Inferact, 2026.
$150M
Seed Round
$800M
Valuation
400k+
GPUs Running vLLM
2,000+
Contributors
The Story

A research project that quietly ate the internet's AI bills

Here is a fun way to think about Inferact. Most startups spend their first two years trying to convince anyone at all to use their software. Inferact started the opposite way. Before the company had a name, a bank account, or a valuation, its software - an open-source engine called vLLM - was already running, at any given moment, on more than 400,000 GPUs around the world. The problem was never distribution. The problem was that all of it was free.

vLLM was incubated in 2023 at UC Berkeley, in the lab of Databricks co-founder Ion Stoica - the same lab that, in a bit of academic sibling rivalry, also produced its main competitor, SGLang. It does the unglamorous half of artificial intelligence. Not the training of models, which gets the headlines and the magazine covers, but the inference - the part where a trained model actually answers your question, over and over, for every user, forever. That part is expensive. vLLM makes it cheaper and faster by handling the fiddly, low-level work of batching requests, caching keys and values, and squeezing performance out of whatever chip it lands on.

The thing about doing the unglamorous half well is that everyone quietly ends up depending on you. vLLM is used in production by Meta, Google, Character.ai, Amazon's cloud, and the Amazon Shopping app, among thousands of others. Which raises the obvious question that eventually confronts every wildly successful open-source project: how do you turn a thing people love precisely because it is free into a business without ruining the thing they love?

"Open source, especially how vLLM itself is structured, is critical to the AI infrastructure in the world."
Simon Mo · Co-Founder & CEO
The Model

Give it away, then build the layer on top

Inferact's answer is deliberately counterintuitive: keep vLLM open, pour paid engineering resources back into it, and build a commercial product - a next-generation "universal inference layer" - on the same foundation. The optimizations developed for money flow back to the community. The pitch to the 50-plus core developers who could work anywhere is that here, they get paid to keep doing the exact thing they already care about.

It is also a smart read on where the hardware is going. As CEO Simon Mo frames it, the enormous GPU clusters being built today for training don't stay training clusters for long - within months they get repurposed to serve models. Inference, in other words, eventually eats everything.

Mission

Grow the world's inference engine

Make AI inference cheaper and faster by growing vLLM as the default engine for serving models - and accelerate AI progress in the process.

Vision

A universal inference layer

Any model, any chip, run efficiently - with open-source vLLM at the center, and Inferact collaborating with existing providers rather than competing with them.

The Team

Four maintainers who turned a repo into a company

SM
Simon Mo
Co-Founder & CEO
WK
Woosuk Kwon
Co-Founder
KY
Kaichao You
Co-Founder
RW
Roger Wang
Co-Founder

All four are founding maintainers of vLLM. CEO Simon Mo was a Berkeley doctoral student before leading the company; co-founder Kaichao You is a Tsinghua University special-award winner. Team size is roughly 22 people.

What They Build

Products & services

Open Source · Live

vLLM

The open-source LLM inference and serving engine created by the founders. It manages continuous batching, KV caching, and low-level chip optimization so diverse models run efficiently across varied hardware. Runs on 400,000+ GPUs concurrently worldwide, with 2,000+ contributors and 50+ core developers.

Commercial · In Development

Universal Inference Layer

A planned next-generation commercial engine built on the vLLM foundation - designed to serve any model on any hardware at lower cost, while collaborating with existing inference providers rather than replacing them.

Follow The Money

A seed round the size of a Series B

Inferact (Seed)
$150M
Valuation
$800M
RadixArk (rival)
$400M
Bars are relative. Inferact's $800M valuation vs. SGLang-commercializer RadixArk's ~$400M. Jan 2026.

The round was co-led by Andreessen Horowitz and Lightspeed Venture Partners - two firms that don't often share a table - with Sequoia Capital, Altimeter Capital, Redpoint Ventures, and ZhenFund joining. a16z's stated reasoning: inference demand grows super-linearly as AI agents take on longer tasks and generate more tokens.

a16z · Co-lead Lightspeed · Co-lead Sequoia Capital Altimeter Redpoint ZhenFund
Who Uses It

Running where the models actually live

  • Meta - production inference
  • Google - production inference
  • Character.ai - production inference
  • Amazon Web Services - cloud service
  • Amazon Shopping app
  • Thousands of organizations via open-source vLLM
Achievements

On the record

  • Raised $150M seed at an $800M valuation (Jan 2026)
  • Built vLLM into the most widely adopted open-source inference engine
  • 400,000+ GPUs running the engine concurrently, worldwide
  • Both a16z and Lightspeed as co-leads of a single round
  • Adopted in production by Meta, Google, Character.ai & Amazon
Latest Updates

The dispatch

The Margins

Five things that amuse

1.The software was running on hundreds of thousands of GPUs before the company officially had a name.
2.vLLM and its main rival SGLang were incubated in the same Berkeley lab, under the same professor, in 2023.
3.The $150M seed is larger than many companies' Series B - for a team of roughly 22 people.
4.Co-founder Kaichao You is a Tsinghua University special-award winner.
5.CEO Simon Mo went from Berkeley doctoral student to running infrastructure a chunk of the AI world depends on.
Spread The Word

Share this profile

WATCH & READ · Investor notes and interviews: a16z: Investing in Inferact · Lightspeed on Inferact · TechCrunch coverage
Video: search "Inferact" or "Simon Mo vLLM" on YouTube for founder talks and demos. (No official channel published at time of writing.)