Baseten - AI Inference Infrastructure
AI INFRASTRUCTURE · SAN FRANCISCO · EST. 2019

Baseten

The GPU layer your AI runs on - whether you know it or not.

$5B Valuation
$585M Total Raised
100+ Enterprise Customers
~200 Employees (2026)

Four Engineers, One Side Project, Zero Regrets

The four people who built Baseten - Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta - all knew each other from Gumroad, the creator monetization platform. It was not a garage myth or a dorm room legend. It was a group of engineers who looked at the emerging AI infrastructure problem, decided someone needed to solve it properly, and concluded that someone should be them.

They founded Baseten in 2019, before the current wave of LLM hype, before "inference" became a word in every VC pitch deck, and before NVIDIA's market cap became a number requiring scientific notation. The timing, in retrospect, was sharp. The space they chose - the plumbing that makes AI work in production - turned out to be exactly where the real difficulty lies.

The team brings an unusual mix of depth. Philip Howes holds a PhD in mathematics. Pankaj Gupta came in with years at Uber and Twitter. Tuhin Srivastava leads as CEO with a focus on enterprise go-to-market. Amir Haghighat anchors the technical direction as CTO. That combination - research rigor, large-scale systems experience, product instinct, and commercial focus - shaped what Baseten became: a company that engineers actually trust.

THE FOUNDING TEAM
Tuhin Srivastava CEO & Co-founder
Amir Haghighat CTO & Co-founder
Philip Howes Co-founder (PhD, Mathematics)
Pankaj Gupta Co-founder (ex-Uber, Twitter)

The Problem Nobody Talks About Until It Breaks

Training an AI model is the part everyone finds interesting. Deploying it so that 10,000 users can hit it simultaneously, with sub-second latency, without cold start delays wrecking their experience, without GPU costs spiraling out of control - that is the part that keeps engineers awake at night.

Baseten is a platform for running AI models in production. Put less clinically: if your company built an AI feature and it actually works when real people use it, there is a reasonable chance Baseten's infrastructure is doing a significant amount of the heavy lifting underneath.

The company runs dedicated and serverless GPU compute, handles the routing, manages the scheduling, and wraps it in tooling that lets engineering teams treat model serving like any other API call. Cursor uses it. Notion uses it. HeyGen - the AI video generation company - uses it. Clay, Quora, Writer, Superhuman, Abridge, Retool. When companies that are themselves known for their AI products need a reliable foundation for inference, a notable number of them end up at Baseten.

THE MISSION

"Make AI inference fast, reliable, and production-ready so developers and enterprises can deploy any model at any scale."

The core insight is simple even if the execution is not: there is a massive gap between "we fine-tuned a model" and "our users get a response in 200ms from anywhere in the world." Baseten closes that gap. They built proprietary networking - the Baseten Delivery Network - optimized specifically for GPU workloads, in a way that general-purpose cloud providers have not prioritized.

Six Tools. One Stack.

Baseten's product line has grown from a single inference platform into a full operating environment for AI teams:

Inference Stack

Dedicated and serverless GPU inference with near-zero cold starts and proprietary networking. The core product that everything else builds on.

Model APIs

Ready-to-call endpoints for popular open-source models. No infrastructure to manage - just an API key and a prompt.

Truss

Open-source framework for packaging and deploying ML models. Predates most of the comparable tools in the ecosystem. 1.1K+ GitHub stars.

Baseten Chains

Compound AI pipeline system for chaining models and logic steps together in production. For when one model call is not enough.

Delivery Network (BDN)

Proprietary global network layer that optimizes GPU routing, scheduling, and data transfer. The invisible layer that makes latency numbers look good.

Training Infrastructure

GPU cluster access for fine-tuning and training workloads, launched in 2025. Rounds out the full model lifecycle under one roof.

The Names Behind The Numbers

One hundred enterprise customers is a headline. The list behind it is more telling. These are not companies experimenting with AI - they are companies whose core product is AI, and they chose Baseten to run it.

Cursor Notion HeyGen Clay Abridge Quora Writer Superhuman Retool OpenEvidence + 90 more

The company reports near-zero churn among these accounts. In SaaS, churn is the honest signal. Customers who depend on Baseten for production traffic do not leave lightly - and apparently, they do not leave at all.

The business model follows the use: usage-based pricing on GPU compute hours for dedicated deployments, per-token or per-request billing for serverless endpoints. Enterprise contracts carry SLAs. If inference is your product, you pay for inference.

From Seed to $5 Billion

Six funding rounds in seven years. Three of them in the 12 months before January 2026. The acceleration tracks directly with the explosion of production AI use - more companies building AI products means more need for the infrastructure underneath them.

2019
Seed Round
Undisclosed

General Catalyst

2021
Series A
$10M

Greylock Partners, General Catalyst

2022
Series B
$40M

Spark Capital, Greylock Partners, General Catalyst

2023
Series C
$75M

IVP, Spark Capital, Greylock Partners

2024
Series D
$110M

IVP, Andreessen Horowitz, Spark Capital

2026
Series E - Unicorn Round
$350M

NVIDIA ($150M strategic), IVP, a16z, Spark Capital, Greylock Partners

The NVIDIA Series E investment is not a passive financial bet. It is a signal about where GPU inference sits in the AI stack - and who NVIDIA thinks will own that layer.

NVIDIA Writes a $150M Check. AWS Signs a Deal.

In the same quarter, Baseten closed two partnerships that would take most companies a decade to land separately. The NVIDIA strategic investment and the AWS Strategic Collaboration Agreement together mean Baseten has deep access to the chips that power inference and the cloud that runs the workloads.

NVIDIA

$150M strategic investor in the Series E. Partnership focuses on inference optimization on H100 and Blackwell GPU architectures. Not just money - access to hardware roadmaps and engineering collaboration that open-market customers do not get.

Amazon Web Services

Strategic Collaboration Agreement signed December 2025. Preferred cloud infrastructure access and joint enterprise go-to-market. The AWS SCA framework is typically reserved for partners AWS is making a long-term bet on.

The Scoreboard

Numbers tell a cleaner story than press releases:

100x
Inference volume growth, YoY through 2025
95%
Win rate in head-to-head competitive evaluations
133%
Year-over-year headcount growth through mid-2025
91+
Public repos on GitHub under basetenlabs org

Latest from Baseten HQ

JAN 2026

Closed $350M Series E led by NVIDIA with $150M strategic investment. Valuation reached $5B. One of the largest inference infrastructure rounds on record.

DEC 2025

Signed AWS Strategic Collaboration Agreement for preferred cloud infrastructure and joint enterprise go-to-market with Amazon Web Services.

DEC 2025

Acquired Parsed, a reinforcement learning startup, adding training optimization capabilities to the platform.

2025

Launched Training Infrastructure - GPU cluster access for fine-tuning and training. Baseten now covers the full model lifecycle from training through production inference.

2025

Crossed 100 enterprise customers. Reported 100x year-over-year inference volume growth. Headcount grew 133% through mid-year to approximately 200 employees.

The Details That Don't Make the Press Release

ORIGIN NOTE

All four co-founders worked together at Gumroad - the creator payment platform - before starting Baseten. The company was built by a team that already knew how to work together.

ACADEMIC EDGE

Co-founder Philip Howes holds a PhD in mathematics. In a field that often confuses "we used a transformer" with deep technical understanding, that matters.

OPEN SOURCE EARLY

Baseten's Truss framework for model packaging predates most of the comparable tools now common in the MLOps ecosystem. They built the open-source tooling before it was fashionable to do so.

BIG SYSTEMS BACKGROUND

Co-founder Pankaj Gupta previously worked at Uber and Twitter - two companies that had to solve massive-scale infrastructure problems before most others knew those problems existed.

THE NVIDIA SIGNAL

NVIDIA's $150M direct investment in Baseten's Series E is, by most accounts, one of NVIDIA's largest direct bets on a single inference infrastructure startup. They do not write checks like that on sentiment alone.

ENTERPRISE TECH 30

Baseten landed on the Enterprise Tech 30 list - a recognition that puts them in company with the infrastructure and tooling vendors that enterprise engineering teams rely on most.

AI Inference GPU Compute LLM MLOps Open Source Developer Tools Enterprise SaaS Machine Learning
Share This Profile
Copied to clipboard!