Palo Alto, California · Founded 2022

Deep Infra runs the models
everyone else trains.

The inference cloud that turns hundreds of open-source AI models into a single, pay-per-use API call - on GPUs it actually owns.

Category: AI Inference Cloud Team: ~25 Raised: $125M+

The Deep Infra mark. The name reads like a warning label and a promise at the same time: the infrastructure is deep, and you don't have to look at it.

Dispatch · The Scene Today

Somewhere in eight US data centers, a GPU is answering a question it has never seen.

It will do this about five trillion times this week, and almost nobody using the answer will know Deep Infra exists.

That is rather the point. A developer in Lisbon ships a chatbot. A startup in Austin builds an agent that reads invoices. A research team swaps one open-source model for a newer one over coffee. None of them buy a graphics card, sign a data-center lease, or learn what "time-to-first-token" means. They change a base URL, paste an API key, and the request lands on hardware Deep Infra bought, racked, and tuned. The company is roughly 25 people. The bill of materials is enormous.

"While massive resources flowed into model training, the infrastructure to actually deploy models in production lagged badly behind."- Deep Infra, on the gap it was built to close

190+

Open-source models served

~5T

Tokens / week

US data centers

$0.06

Per million tokens, from

The Tension

The world spent a fortune teaching machines to think. It forgot to pay for the thinking itself.

For three years the headlines were all about training: bigger clusters, longer runs, more parameters. Useful, expensive, photogenic. But a trained model is just a file until something runs it - and running it, millions of times a second, for paying customers who expect a reply before they get bored, turns out to be a different and unglamorous engineering problem.

Open-source models made this worse in the nicest possible way. Suddenly there were dozens of excellent models - Llama, Qwen, DeepSeek, Kimi, GLM, Mistral - free to download and miserable to operate. You needed GPUs you couldn't buy, quantization tricks you didn't have time to learn, and uptime nobody promised you. The model was free. The inference was the bill.

A trained model is a brilliant idea on a hard drive. Inference is the part where it has to show up to work.- The problem Deep Infra sells against

So most teams did the expensive thing: they rented a single closed model from a single vendor, paid the markup, and hoped that vendor's roadmap matched theirs. The alternative - run the open stuff yourself - was a second full-time company. Deep Infra's wager was that almost no one actually wants to run a data center. They just want the answer.

The Bet

Three engineers who had already moved hundreds of millions of messages decided to move tokens instead.

Deep Infra was founded in 2022 by Nikola Borisov, Yessenzhar Kanapin, and Georgios Papoutsis - veterans of the team behind imo, a messenger that scaled to hundreds of millions of users. That is the relevant biographical detail. They had already learned, the hard way, that the difference between a demo and a service is the boring middle: reliability, latency, and the unromantic discipline of keeping things running while everyone else celebrates the launch.

Meet the founders

Nikola Borisov

Co-Founder & CEO

Yessenzhar Kanapin

Co-Founder

Georgios Papoutsis

Co-Founder

Their bet had an unfashionable shape: own the hardware. Plenty of inference startups resell capacity from the big clouds and live on the margin in between. Deep Infra went the other way and bought its own GPUs, including, recently, crates of Nvidia's Blackwell chips. It is a heavier, riskier path. It is also the only one that lets you control price, latency, and a strict no-logging promise all at once.

Anyone can rent a GPU by the hour. Deep Infra's idea was to own the racks and rent you the hard part instead.- On the company's contrarian capital choice

The Receipts

From a seed to a five-trillion-token habit.

2022

Founded in Palo Alto by the team behind the imo messenger to fix the AI deployment gap.

2023

Seed funding from Felicis and early believer Georges Harik; the serverless model API goes live.

April 2025

$18M Series A led by Felicis, with advisor Georges Harik, to expand the model catalog and GPU fleet.

2025-26

Scales inference volume more than 8,000x since seed; takes delivery of Nvidia Blackwell GPUs.

May 2026

$107M Series B co-led by 500 Global and Georges Harik - Nvidia, Samsung Next, Supermicro, Peak6, Upper90, Crescent Cove, A.Capital and Felicis join.

The Product

One API. Hundreds of models. A bill that fits on a startup's credit card.

Deep Infra's pitch is almost rude in its simplicity: the API is OpenAI-compatible, so switching usually means changing a base URL and a key. Behind that door sit 190+ open-source models across text, image and video generation, speech, and embeddings - Kimi, Qwen, DeepSeek, GLM, Llama, gpt-oss, and the rest of the open frontier. Pricing starts around $0.06 per million tokens, with no minimums and no setup fee. For everything that isn't text, you pay by the second of inference.

Serverless Inference API

190+ open-source models behind one OpenAI-compatible endpoint. Text, images, video, speech, embeddings. Pay per token or per inference-second.

Dedicated GPU Instances

Private endpoints on Nvidia A100, H100, H200, B200 and B300 with autoscaling, for teams that need guaranteed, isolated capacity.

Private Deployments

Host your own custom or fine-tuned models on Deep Infra's optimized stack, behind a strict no-logging private endpoint.

GPU Rental

Rent raw GPU capacity in Deep Infra's US-based data centers for the workloads that don't fit a neat API.

Switching from a closed model to Deep Infra is a two-line diff. The hard part is admitting it was ever harder than that.- On the OpenAI-compatible API

The Proof

The numbers do the arguing.

Investors tend to be skeptical of inference startups, and reasonably so - margins are thin and the hyperscalers are circling. What changed the conversation for Deep Infra was the slope of one line: how much it raised, round over round, as usage compounded.

Funding raised, round over round

USD millions · seed → Series B

~$2.6M

Seed
2023

$18M

Series A
Apr 2025

$107M

Series B
May 2026

Figures from public funding announcements; seed amount is third-party reported and approximate.

The capital tells one story; the customers tell another. Deep Infra now processes close to five trillion tokens a week for developers and enterprises running production and agentic workloads, having scaled volume more than 8,000x since its seed. The Series B itself reads like a hardware-and-distribution alliance: Nvidia and Supermicro on the silicon and servers, Samsung Next and 500 Global and Peak6 on the cap table.

Who's in the room

Nvidia

Investor and chip supplier - Deep Infra runs its GPUs, Blackwell included.

Supermicro

Investor and server hardware partner for the data-center build-out.

500 Global

Co-led the $107M Series B.

Samsung Next

Strategic investor in the Series B round.

The Mission

Make running a model as boring as serving a web page.

Deep Infra organizes itself around four pillars it repeats like a creed: reliability for critical applications, performance, privacy through a strict no-logging policy, and deep full-stack infrastructure expertise. None of those are slogans you'd put on a t-shirt. All of them are the difference between a side project and something a business will route real revenue through.

Privacy here is a default, not a feature tier: prompts and outputs aren't retained. The most interesting thing Deep Infra does with your data is forget it.- On the no-logging policy

The company is SOC 2 and ISO 27001 certified, which is the unsexy paperwork that lets a regulated enterprise say yes. That posture - own the hardware, keep nothing, prove it on paper - is the whole personality of the place. It is an infrastructure company that would prefer you never think about infrastructure.

Why It Matters Tomorrow

The model you deploy next year doesn't exist yet. That's the whole business.

Open-source AI moves in weeks. A model that's state-of-the-art today is a footnote by the next quarter, and teams that hard-wired themselves to one closed vendor get to feel that pain personally. Deep Infra is selling optionality: a single place to run whatever the open community ships next, at a price set by a company that owns the metal instead of renting it.

Things that are quietly true about Deep Infra

Switching from a closed model usually means changing a base URL and a key - nothing more.
The founders previously scaled the imo messenger to hundreds of millions of users.
It owns its GPUs rather than reselling cloud capacity - rare for a company its size.
A 25-person team moves trillions of tokens a week.
The no-logging policy means your prompts and outputs aren't kept around.

Back in those eight data centers, the GPU is still answering questions it has never seen - a few trillion more than when you started reading. The difference Deep Infra makes is not that the question gets answered. It's that the person asking never had to think about the machine, the lease, the chip shortage, or the markup. The infrastructure stayed deep, and out of sight, which is exactly where the founders always wanted it. The model is the star. Deep Infra is content to be the stage.

Deep Infra runs the models
everyone else trains.

Somewhere in eight US data centers, a GPU is answering a question it has never seen.

The world spent a fortune teaching machines to think. It forgot to pay for the thinking itself.

Three engineers who had already moved hundreds of millions of messages decided to move tokens instead.

Meet the founders

From a seed to a five-trillion-token habit.

One API. Hundreds of models. A bill that fits on a startup's credit card.

Serverless Inference API

Dedicated GPU Instances

Private Deployments

GPU Rental

The numbers do the arguing.

Funding raised, round over round

Who's in the room

Nvidia

Supermicro

500 Global

Samsung Next

Make running a model as boring as serving a web page.

The model you deploy next year doesn't exist yet. That's the whole business.

Things that are quietly true about Deep Infra

Demos, talks & deep dives

Find Deep Infra

Share this profile

Deep Infra runs the modelseveryone else trains.

Somewhere in eight US data centers, a GPU is answering a question it has never seen.

The world spent a fortune teaching machines to think. It forgot to pay for the thinking itself.

Three engineers who had already moved hundreds of millions of messages decided to move tokens instead.

Meet the founders

From a seed to a five-trillion-token habit.

One API. Hundreds of models. A bill that fits on a startup's credit card.

Serverless Inference API

Dedicated GPU Instances

Private Deployments

GPU Rental

The numbers do the arguing.

Funding raised, round over round

Who's in the room

Nvidia

Supermicro

500 Global

Samsung Next

Make running a model as boring as serving a web page.

The model you deploy next year doesn't exist yet. That's the whole business.

Things that are quietly true about Deep Infra

Demos, talks & deep dives

Find Deep Infra

Share this profile

Deep Infra runs the models
everyone else trains.