DEEP INFRA CLOSES $107M SERIES B - MAY 2026 NEARLY 5 TRILLION TOKENS PROCESSED PER WEEK BACKED BY NVIDIA, 500 GLOBAL, FELICIS & SAMSUNG NEXT TOKEN VOLUME UP 25x SINCE SERIES A 150+ OPEN-SOURCE MODELS, 8 U.S. DATA CENTERS DEEP INFRA CLOSES $107M SERIES B - MAY 2026 NEARLY 5 TRILLION TOKENS PROCESSED PER WEEK BACKED BY NVIDIA, 500 GLOBAL, FELICIS & SAMSUNG NEXT TOKEN VOLUME UP 25x SINCE SERIES A 150+ OPEN-SOURCE MODELS, 8 U.S. DATA CENTERS
Deep Infra // Palo Alto // The Inference Cloud

Nikola
Borisov

He bet a company on the least glamorous word in artificial intelligence: inference. It is now printing trillions of tokens a week.

Nikola Borisov, co-founder and CEO of Deep Infra
Nikola Borisov, window seat. The founder who prefers plumbing to spotlights.
Share LinkedIn X / Twitter Facebook Instagram
The Story

Betting on the boring half of AI

Every conversation about artificial intelligence tends to sprint toward the same place: bigger models, longer training runs, the frontier lab that trained the newest brain. Nikola Borisov looked at all of that and went the other direction. He co-founded Deep Infra on a plainer question - once a model exists, who runs it, and how cheaply? That question, unglamorous as it sounds, turns out to be where the money and the bottleneck actually live.

Borisov is the co-founder and chief executive of Deep Infra, a Palo Alto company that operates a dedicated inference cloud for open-source and open-weight AI models. In plain terms: developers point an API at Deep Infra, and Deep Infra runs the model for them - fast, and for less. The company supports more than 150 open-source models through OpenAI-compatible APIs, and it does so from its own hardware in eight U.S. data centers rather than renting capacity from a hyperscaler. That ownership is the whole trick.

We've tried to excel in terms of providing the best cost. And we can do that because we own and operate the hardware, and we have a very optimized inference stack.
- Nikola Borisov, CEO, Deep Infra

The numbers give the thesis teeth. By mid-2026 Deep Infra was processing close to five trillion tokens a week. Since its Series A the volume of tokens it handles had grown roughly twenty-five fold, and revenue had tripled since the start of the year. Those are not the metrics of a science project. They are the metrics of infrastructure that people quietly depend on.

Why inference, and why now

Borisov's contrarian read is that the industry has been staring at the wrong cost line. Founders obsess over which model to use and what it costs to train. But models get trained once; they get run forever. As AI shifts from single chat replies to agents that fire off fifty, a hundred, sometimes hundreds of model calls to finish one task, the expensive part is no longer the training - it is the running. Deep Infra's Series B announcement put it bluntly: inference is no longer a thin layer on top of the stack, it is the system constraint that will define most workloads.

There is a second half to the argument, and it is the reason a company built on open models can thrive at all. Borisov argues that open-source models now trail the best closed models by only three to five percent on benchmarks - a gap he measures in months, not years. If the open models are six months behind rather than five years behind, then the smart move is not to bet everything on a single frontier lab keeping its lead forever. It is to build the cheapest, fastest place to run whichever good-enough model a developer wants today.

Open source models are now just 3 to 5 percent behind the best closed source models - about six months of lag time, not five years.
- Nikola Borisov, on the shrinking model gap
By the Numbers
$107M
Series B, May 2026
~5T
Tokens / Week
25x
Growth Since Series A
150+
Open Models Served

Figures reported by Deep Infra and press coverage around the May 2026 Series B. Total capital raised across seed, Series A and Series B is roughly $133M.

Before the Cloud

A decade of making things fast

Long before he owned GPUs, Borisov was the kid winning algorithm contests. He took a gold medal at the 12th Balkan Olympiad in Informatics in 2004, then carried that competitive streak to Northwestern University, where he ran the ACM chapter and won the ICPC Mid-Central Regional Championship in both 2006 and 2007. He graduated in 2010 with a computer science degree and an economics minor - a combination that reads, in hindsight, like a spoiler for a career spent squeezing cost out of compute.

He spent the next stretch of his career at imo.im, the messaging platform, joining in 2010 and climbing to Director of Engineering by 2020. There he built and scaled distributed backends serving more than 200 million users across global data centers, wrestling with the exact problems - latency, cost per request, keeping services up at planetary scale - that would later define Deep Infra. He then joined the founding team at HalloApp as a backend engineer, building its backend in Erlang, ProtoBuf, AWS and Redis. Erlang, a language born for 1990s telephone switches, tells you something about the kind of engineer he is: he reaches for whatever actually holds up under load.

We wanted to provide GPUs and a low-cost way of deploying trained machine learning models.

Inference is no longer a thin layer on top of an AI stack. It's the system constraint that will define the majority of workloads.

The Company

From stealth to NVIDIA's short list

Borisov co-founded Deep Infra in 2022 alongside Georgios Papoutsis and Yessenzhar Kanapin. The company came out of stealth in 2023 with an $8 million seed round led by A.Capital Ventures and Felicis. An $18 million Series A followed in 2025, led by Felicis with investor Georges Harik. Then, in May 2026, the big one: a $107 million Series B co-led by 500 Global and Harik, with a roster that reads like an AI-infrastructure who's who - NVIDIA, Felicis, Samsung Next, Crescent Cove, PEAK6, Supermicro and Upper90.

NVIDIA's presence is more than a check. Deep Infra has been an early collaborator on Blackwell GPUs and the coming Vera Rubin architecture, which means Borisov's team gets its hands on next-generation silicon before most of the market. For a company whose entire edge is running models cheaply and quickly, early access to the fastest chips is oxygen. The Series B money is earmarked for the obvious things: more global compute, deeper developer tooling, and support for the next wave of open-source and agentic models.

What is striking is how small the team is relative to the throughput. A company of roughly two dozen people, headquartered on Middlefield Road in Palo Alto, quietly moving trillions of tokens a week. That ratio - enormous volume, lean crew - is the clearest fingerprint of Borisov's engineering instincts. Efficiency is not a marketing line here. It is the product.

Curiosities

Things worth knowing

Teenage formGold at the Balkan Olympiad in Informatics came in 2004 - years before his first line of production code. The competitive wiring came standard.
Own the metalDeep Infra buys and operates its own GPUs instead of renting cloud capacity. It is the least fashionable and most decisive choice the company made.
Scale, quietlyRoughly five trillion tokens a week move through infrastructure most people have never heard of - a number that grew 25x in a single funding cycle.
Telecom rootsHe helped build HalloApp's founding backend in Erlang, a language better known for 1990s phone switches than for modern AI.
Find Him

Links & sources

Sources: Deep Infra, Crunchbase, The Org, VentureBeat, SiliconANGLE, Edge Infrastructure Review, Midstage Accelerator podcast, Grokipedia. Facts drawn from public reporting as of mid-2026.