BREAKING Gimlet Labs closes $80M Series A led by Menlo Ventures $92M total raised 3x to 10x inference speedups, same cost, same power Eight-figure revenue out of stealth kforge writes CUDA, ROCm, Metal kernels from PyTorch 26 people. One multi-silicon cloud. d-Matrix partnership announced San Francisco, 255 Potrero Ave BREAKING Gimlet Labs closes $80M Series A led by Menlo Ventures $92M total raised 3x to 10x inference speedups, same cost, same power Eight-figure revenue out of stealth kforge writes CUDA, ROCm, Metal kernels from PyTorch 26 people. One multi-silicon cloud. d-Matrix partnership announced San Francisco, 255 Potrero Ave
Vol. XXVI  •  Issue 06 San Francisco YesPress Profile
AI Infrastructure / Profile

Gimlet Labs

The applied research lab that decided AI workloads should not care which chip they are running on - and then convinced the frontier labs to pay for the privilege.

Gimlet Labs co-founders Omid Azizi, Natalie Serrino, Zain Asgar and Michelle Nguyen
CAPTION - The four faces between you and your next inference bill. Co-founders Omid Azizi, Natalie Serrino, Zain Asgar and Michelle Nguyen, photographed for the Series A announcement.
Series A · 80M
The Lede

A Quiet Room on Potrero Avenue, And A Very Loud Cloud

It is a Tuesday afternoon in San Francisco's Potrero neighborhood. The Gimlet Labs office is full of monitors and conspicuously empty of marketing materials. Engineers stare at flame graphs. A whiteboard lists six accelerator codenames - some you can buy, some you cannot yet. Somewhere in this building, a model is being scheduled across a CPU, a GPU, and an unnamed third party of silicon. Nobody outside the room is supposed to know which. That is, in essence, the whole product.

Gimlet is twenty-six people. Most of them have shipped infrastructure that other people now depend on. Their CEO, Zain Asgar, was a GPU architect at NVIDIA and an engineering lead at Google AI before he became, in the most polite Stanford way, the kind of person who teaches a Wednesday class and runs an eight-figure startup the rest of the week. The co-founders worked together on Pixie, the eBPF observability tool that New Relic bought in 2020. Pixie taught them how to make something invisible useful. Gimlet is the same trick, applied to a much more expensive layer of the stack.

"Gimlet has created what it claims is the first and only multi-silicon inference cloud - software that allows an AI workload to be simultaneously run across diverse types of hardware." - TechCrunch, March 2026

The Bottleneck Nobody Wants To Own

The story of AI in 2026 is mostly a story of arithmetic. Inference is now the largest single line item in many AI companies' budgets. Buying a frontier GPU has the dignity of a Hermès reservation. Power is rationed, racks are rationed, and somewhere a procurement officer is begging for any chip that is faster than the lawyer is. Into this scene walks a small, polite startup with a strange offer: keep your GPUs, keep your CPUs, keep whatever the AMD rep brings to the next meeting. Gimlet's software will use all of them, simultaneously, and pretend it was always meant to work that way.

The technical claim is that the multi-silicon inference cloud reliably delivers three to ten times the inference throughput for the same power and the same dollars. The non-technical claim is more interesting: silicon vendor lock-in is a software problem, not a hardware problem. If you can compile to any backend, the moat dries up. That argument has been made before - by Modular, by OctoML, by anyone who ever spelled the letters M-L-I-R out loud at a dinner party. Gimlet's argument is that it has the system shipping, the kernels being written by a tool called kforge, and the eight-figure revenue to defend the slide.

The Tool That Refuses To Write CUDA

kforge is the bit that makes the engineers smile. It is a tool that takes PyTorch and emits optimized low-level kernels - CUDA, ROCm, Metal - without a human writing any of them. This is the part of the job that hardware companies have long staffed with people who can hold a memory hierarchy in their heads. Gimlet's bet is that this work is, finally, ready to be automated. If they are right, the cost of supporting a new accelerator drops from a small team and a roadmap to an afternoon. If they are wrong, they are wrong with eight figures of revenue, which is not the worst place to be wrong.

Sitting on top of kforge is Gimlet Cloud, a serverless platform for AI agents. The pitch reads like a dare to the hyperscalers: deploy a single agent or a tangled multi-agent system, hand it some custom code and a few data sources, and let Gimlet handle scheduling, scaling and optimization. The customer never sees which chip ran which step. The customer also never sees the bill they would have paid otherwise.

"Gimlet Labs reliably speeds AI inference up by 3x to 10x for the same cost and power." - TechCrunch coverage of the Series A

The Money And The People Behind It

The seed round was twelve million dollars, closed quietly enough that most people learned about it from the stealth-exit press release in October 2025. The angel list is the part you forward to friends. Bill Coughran of Sequoia. Stanford's Nick McKeown. Former VMware CEO Raghu Raghuram. Intel's CEO Lip-Bu Tan. A roster like that is not a fundraising tactic. It is a signal that the systems crowd believes someone has, finally, found a respectable way to disagree with NVIDIA.

The Series A, announced in March 2026, was eighty million dollars, led by Menlo Ventures, with Eclipse Ventures, Prosperity7, Triatomic and Factory along for the round. That brings the total to ninety-two million. At twenty-six employees, the per-head valuation joke writes itself - and so does the per-head revenue joke. The company would prefer you skip both and read the partnership announcement with d-Matrix, which claims ten-fold speedups on frontier workloads when you put Gimlet's compiler and d-Matrix's silicon in the same room.

What This Actually Lets You Do

Here is the practical version. If you are a frontier AI lab, you can stop turning down chips because your software stack only speaks one dialect. If you are a smaller AI company, you can ship a multi-agent product to a serverless platform that takes care of the scheduling. If you are a chip startup, you can have your hardware supported on day one by writing nothing yourself. If you are an enterprise that has been watching inference budgets eat margin, you can probably get a quote back that does not require a small mortgage. None of this is glamorous in the way that a foundation model is glamorous. It is the kind of unglamorous that compounds.

The Mood In The Room

Back in the Potrero office, an engineer makes a joke about kernel quality scores. Someone else points at a graph nobody else can read and says something is now within five percent of hand-tuned. A model finishes its run, split across the three machines under the table, and exits cleanly. Nobody claps. Gimlet Labs does not have a culture of clapping. They have a culture of shipping the thing that makes the next thing possible. That is, in the end, what an applied research lab is supposed to do, and they appear to be doing it.

What They Ship

Three Layers Of The Same Argument

Gimlet's argument: AI workloads should not know what silicon they run on. Here is the stack that makes the argument true.

PLATFORM

Gimlet Cloud

Serverless inference for AI agents - from a single agent to multi-agent systems with custom code and external data. Scheduling, scaling and optimization, handled.

COMPILER

kforge

Autonomously generates optimized low-level kernels directly from PyTorch across CUDA, ROCm and Metal. No hand-written kernels. No vendor lock.

INFRASTRUCTURE

Multi-Silicon Cloud

One workload, many chips - simultaneously. CPUs, GPUs, high-memory systems, accelerators. 3x to 10x more throughput at the same cost and power.

By The Numbers

An Infographic In Four Acts

$92MTotal Raised
26People
10xPeak Speedup
3GPU Backends

Reported Inference Throughput Gain

Baseline
1x
Gimlet Min
3x
Gimlet Peak
10x
The Cast

Five Founders, One Distributed System

They built Pixie together. New Relic bought it. They reassembled.

Zain Asgar
Co-Founder · CEO
Michelle Nguyen
Co-Founder
Omid Azizi
Co-Founder
Natalie Serrino
Co-Founder
James Bartlett
Co-Founder
Recent History

The Year Of Gimlet

2025 · OCT
Out of stealth. Public launch with $12M seed and eight-figure revenue. Gimlet Cloud and kforge become things you can buy.
2026 · MAR
Series A. $80M led by Menlo Ventures. Eclipse, Prosperity7, Triatomic and Factory join.
2026 · MAR
d-Matrix partnership. Joint engineering for up to 10x speedups on frontier AI workloads.
2026 · JUN
Still 26 people. Still in Potrero. Still shipping.
Marginalia

Things Worth Knowing

FUN FACT 01

Zain Asgar is an adjunct professor at Stanford. The kind of professor who teaches on Wednesdays and ships compilers the other six days.

FUN FACT 02

The angel list includes Intel CEO Lip-Bu Tan. Read that sentence twice.

FUN FACT 03

Headquartered at 255 Potrero Avenue - a few blocks from most of the AI companies they sell to.

FUN FACT 04

More revenue than headcount, expressed in millions. A rare ratio in 2026 AI.

Read · Watch · Follow

Where To Go Next

Share This Story

Back in the Potrero office, the flame graphs flatten. Three chips finish a job that one chip used to. Nobody claps. Somewhere a procurement officer sighs in relief. The next model is already queued.