The inference cloud that turns hundreds of open-source AI models into a single, pay-per-use API call - on GPUs it actually owns.
It will do this about five trillion times this week, and almost nobody using the answer will know Deep Infra exists.
That is rather the point. A developer in Lisbon ships a chatbot. A startup in Austin builds an agent that reads invoices. A research team swaps one open-source model for a newer one over coffee. None of them buy a graphics card, sign a data-center lease, or learn what "time-to-first-token" means. They change a base URL, paste an API key, and the request lands on hardware Deep Infra bought, racked, and tuned. The company is roughly 25 people. The bill of materials is enormous.
For three years the headlines were all about training: bigger clusters, longer runs, more parameters. Useful, expensive, photogenic. But a trained model is just a file until something runs it - and running it, millions of times a second, for paying customers who expect a reply before they get bored, turns out to be a different and unglamorous engineering problem.
Open-source models made this worse in the nicest possible way. Suddenly there were dozens of excellent models - Llama, Qwen, DeepSeek, Kimi, GLM, Mistral - free to download and miserable to operate. You needed GPUs you couldn't buy, quantization tricks you didn't have time to learn, and uptime nobody promised you. The model was free. The inference was the bill.
So most teams did the expensive thing: they rented a single closed model from a single vendor, paid the markup, and hoped that vendor's roadmap matched theirs. The alternative - run the open stuff yourself - was a second full-time company. Deep Infra's wager was that almost no one actually wants to run a data center. They just want the answer.
Deep Infra was founded in 2022 by Nikola Borisov, Yessenzhar Kanapin, and Georgios Papoutsis - veterans of the team behind imo, a messenger that scaled to hundreds of millions of users. That is the relevant biographical detail. They had already learned, the hard way, that the difference between a demo and a service is the boring middle: reliability, latency, and the unromantic discipline of keeping things running while everyone else celebrates the launch.
Their bet had an unfashionable shape: own the hardware. Plenty of inference startups resell capacity from the big clouds and live on the margin in between. Deep Infra went the other way and bought its own GPUs, including, recently, crates of Nvidia's Blackwell chips. It is a heavier, riskier path. It is also the only one that lets you control price, latency, and a strict no-logging promise all at once.
Deep Infra's pitch is almost rude in its simplicity: the API is OpenAI-compatible, so switching usually means changing a base URL and a key. Behind that door sit 190+ open-source models across text, image and video generation, speech, and embeddings - Kimi, Qwen, DeepSeek, GLM, Llama, gpt-oss, and the rest of the open frontier. Pricing starts around $0.06 per million tokens, with no minimums and no setup fee. For everything that isn't text, you pay by the second of inference.
190+ open-source models behind one OpenAI-compatible endpoint. Text, images, video, speech, embeddings. Pay per token or per inference-second.
Private endpoints on Nvidia A100, H100, H200, B200 and B300 with autoscaling, for teams that need guaranteed, isolated capacity.
Host your own custom or fine-tuned models on Deep Infra's optimized stack, behind a strict no-logging private endpoint.
Rent raw GPU capacity in Deep Infra's US-based data centers for the workloads that don't fit a neat API.
Investors tend to be skeptical of inference startups, and reasonably so - margins are thin and the hyperscalers are circling. What changed the conversation for Deep Infra was the slope of one line: how much it raised, round over round, as usage compounded.
The capital tells one story; the customers tell another. Deep Infra now processes close to five trillion tokens a week for developers and enterprises running production and agentic workloads, having scaled volume more than 8,000x since its seed. The Series B itself reads like a hardware-and-distribution alliance: Nvidia and Supermicro on the silicon and servers, Samsung Next and 500 Global and Peak6 on the cap table.
Investor and chip supplier - Deep Infra runs its GPUs, Blackwell included.
Investor and server hardware partner for the data-center build-out.
Co-led the $107M Series B.
Strategic investor in the Series B round.
Deep Infra organizes itself around four pillars it repeats like a creed: reliability for critical applications, performance, privacy through a strict no-logging policy, and deep full-stack infrastructure expertise. None of those are slogans you'd put on a t-shirt. All of them are the difference between a side project and something a business will route real revenue through.
The company is SOC 2 and ISO 27001 certified, which is the unsexy paperwork that lets a regulated enterprise say yes. That posture - own the hardware, keep nothing, prove it on paper - is the whole personality of the place. It is an infrastructure company that would prefer you never think about infrastructure.
Open-source AI moves in weeks. A model that's state-of-the-art today is a footnote by the next quarter, and teams that hard-wired themselves to one closed vendor get to feel that pain personally. Deep Infra is selling optionality: a single place to run whatever the open community ships next, at a price set by a company that owns the metal instead of renting it.
Back in those eight data centers, the GPU is still answering questions it has never seen - a few trillion more than when you started reading. The difference Deep Infra makes is not that the question gets answered. It's that the person asking never had to think about the machine, the lease, the chip shortage, or the markup. The infrastructure stayed deep, and out of sight, which is exactly where the founders always wanted it. The model is the star. Deep Infra is content to be the stage.