The four people who built Baseten - Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta - all knew each other from Gumroad, the creator monetization platform. It was not a garage myth or a dorm room legend. It was a group of engineers who looked at the emerging AI infrastructure problem, decided someone needed to solve it properly, and concluded that someone should be them.
They founded Baseten in 2019, before the current wave of LLM hype, before "inference" became a word in every VC pitch deck, and before NVIDIA's market cap became a number requiring scientific notation. The timing, in retrospect, was sharp. The space they chose - the plumbing that makes AI work in production - turned out to be exactly where the real difficulty lies.
The team brings an unusual mix of depth. Philip Howes holds a PhD in mathematics. Pankaj Gupta came in with years at Uber and Twitter. Tuhin Srivastava leads as CEO with a focus on enterprise go-to-market. Amir Haghighat anchors the technical direction as CTO. That combination - research rigor, large-scale systems experience, product instinct, and commercial focus - shaped what Baseten became: a company that engineers actually trust.
Training an AI model is the part everyone finds interesting. Deploying it so that 10,000 users can hit it simultaneously, with sub-second latency, without cold start delays wrecking their experience, without GPU costs spiraling out of control - that is the part that keeps engineers awake at night.
Baseten is a platform for running AI models in production. Put less clinically: if your company built an AI feature and it actually works when real people use it, there is a reasonable chance Baseten's infrastructure is doing a significant amount of the heavy lifting underneath.
The company runs dedicated and serverless GPU compute, handles the routing, manages the scheduling, and wraps it in tooling that lets engineering teams treat model serving like any other API call. Cursor uses it. Notion uses it. HeyGen - the AI video generation company - uses it. Clay, Quora, Writer, Superhuman, Abridge, Retool. When companies that are themselves known for their AI products need a reliable foundation for inference, a notable number of them end up at Baseten.
"Make AI inference fast, reliable, and production-ready so developers and enterprises can deploy any model at any scale."
The core insight is simple even if the execution is not: there is a massive gap between "we fine-tuned a model" and "our users get a response in 200ms from anywhere in the world." Baseten closes that gap. They built proprietary networking - the Baseten Delivery Network - optimized specifically for GPU workloads, in a way that general-purpose cloud providers have not prioritized.
Baseten's product line has grown from a single inference platform into a full operating environment for AI teams:
Dedicated and serverless GPU inference with near-zero cold starts and proprietary networking. The core product that everything else builds on.
Ready-to-call endpoints for popular open-source models. No infrastructure to manage - just an API key and a prompt.
Open-source framework for packaging and deploying ML models. Predates most of the comparable tools in the ecosystem. 1.1K+ GitHub stars.
Compound AI pipeline system for chaining models and logic steps together in production. For when one model call is not enough.
Proprietary global network layer that optimizes GPU routing, scheduling, and data transfer. The invisible layer that makes latency numbers look good.
GPU cluster access for fine-tuning and training workloads, launched in 2025. Rounds out the full model lifecycle under one roof.
One hundred enterprise customers is a headline. The list behind it is more telling. These are not companies experimenting with AI - they are companies whose core product is AI, and they chose Baseten to run it.
The company reports near-zero churn among these accounts. In SaaS, churn is the honest signal. Customers who depend on Baseten for production traffic do not leave lightly - and apparently, they do not leave at all.
The business model follows the use: usage-based pricing on GPU compute hours for dedicated deployments, per-token or per-request billing for serverless endpoints. Enterprise contracts carry SLAs. If inference is your product, you pay for inference.
Six funding rounds in seven years. Three of them in the 12 months before January 2026. The acceleration tracks directly with the explosion of production AI use - more companies building AI products means more need for the infrastructure underneath them.
General Catalyst
Greylock Partners, General Catalyst
Spark Capital, Greylock Partners, General Catalyst
IVP, Spark Capital, Greylock Partners
IVP, Andreessen Horowitz, Spark Capital
NVIDIA ($150M strategic), IVP, a16z, Spark Capital, Greylock Partners
In the same quarter, Baseten closed two partnerships that would take most companies a decade to land separately. The NVIDIA strategic investment and the AWS Strategic Collaboration Agreement together mean Baseten has deep access to the chips that power inference and the cloud that runs the workloads.
$150M strategic investor in the Series E. Partnership focuses on inference optimization on H100 and Blackwell GPU architectures. Not just money - access to hardware roadmaps and engineering collaboration that open-market customers do not get.
Strategic Collaboration Agreement signed December 2025. Preferred cloud infrastructure access and joint enterprise go-to-market. The AWS SCA framework is typically reserved for partners AWS is making a long-term bet on.
Numbers tell a cleaner story than press releases:
Closed $350M Series E led by NVIDIA with $150M strategic investment. Valuation reached $5B. One of the largest inference infrastructure rounds on record.
Signed AWS Strategic Collaboration Agreement for preferred cloud infrastructure and joint enterprise go-to-market with Amazon Web Services.
Acquired Parsed, a reinforcement learning startup, adding training optimization capabilities to the platform.
Launched Training Infrastructure - GPU cluster access for fine-tuning and training. Baseten now covers the full model lifecycle from training through production inference.
Crossed 100 enterprise customers. Reported 100x year-over-year inference volume growth. Headcount grew 133% through mid-year to approximately 200 employees.
All four co-founders worked together at Gumroad - the creator payment platform - before starting Baseten. The company was built by a team that already knew how to work together.
Co-founder Philip Howes holds a PhD in mathematics. In a field that often confuses "we used a transformer" with deep technical understanding, that matters.
Baseten's Truss framework for model packaging predates most of the comparable tools now common in the MLOps ecosystem. They built the open-source tooling before it was fashionable to do so.
Co-founder Pankaj Gupta previously worked at Uber and Twitter - two companies that had to solve massive-scale infrastructure problems before most others knew those problems existed.
NVIDIA's $150M direct investment in Baseten's Series E is, by most accounts, one of NVIDIA's largest direct bets on a single inference infrastructure startup. They do not write checks like that on sentiment alone.
Baseten landed on the Enterprise Tech 30 list - a recognition that puts them in company with the infrastructure and tooling vendors that enterprise engineering teams rely on most.