A Quiet Room on Potrero Avenue, And A Very Loud Cloud
It is a Tuesday afternoon in San Francisco's Potrero neighborhood. The Gimlet Labs office is full of monitors and conspicuously empty of marketing materials. Engineers stare at flame graphs. A whiteboard lists six accelerator codenames - some you can buy, some you cannot yet. Somewhere in this building, a model is being scheduled across a CPU, a GPU, and an unnamed third party of silicon. Nobody outside the room is supposed to know which. That is, in essence, the whole product.
Gimlet is twenty-six people. Most of them have shipped infrastructure that other people now depend on. Their CEO, Zain Asgar, was a GPU architect at NVIDIA and an engineering lead at Google AI before he became, in the most polite Stanford way, the kind of person who teaches a Wednesday class and runs an eight-figure startup the rest of the week. The co-founders worked together on Pixie, the eBPF observability tool that New Relic bought in 2020. Pixie taught them how to make something invisible useful. Gimlet is the same trick, applied to a much more expensive layer of the stack.
The Bottleneck Nobody Wants To Own
The story of AI in 2026 is mostly a story of arithmetic. Inference is now the largest single line item in many AI companies' budgets. Buying a frontier GPU has the dignity of a Hermès reservation. Power is rationed, racks are rationed, and somewhere a procurement officer is begging for any chip that is faster than the lawyer is. Into this scene walks a small, polite startup with a strange offer: keep your GPUs, keep your CPUs, keep whatever the AMD rep brings to the next meeting. Gimlet's software will use all of them, simultaneously, and pretend it was always meant to work that way.
The technical claim is that the multi-silicon inference cloud reliably delivers three to ten times the inference throughput for the same power and the same dollars. The non-technical claim is more interesting: silicon vendor lock-in is a software problem, not a hardware problem. If you can compile to any backend, the moat dries up. That argument has been made before - by Modular, by OctoML, by anyone who ever spelled the letters M-L-I-R out loud at a dinner party. Gimlet's argument is that it has the system shipping, the kernels being written by a tool called kforge, and the eight-figure revenue to defend the slide.
The Tool That Refuses To Write CUDA
kforge is the bit that makes the engineers smile. It is a tool that takes PyTorch and emits optimized low-level kernels - CUDA, ROCm, Metal - without a human writing any of them. This is the part of the job that hardware companies have long staffed with people who can hold a memory hierarchy in their heads. Gimlet's bet is that this work is, finally, ready to be automated. If they are right, the cost of supporting a new accelerator drops from a small team and a roadmap to an afternoon. If they are wrong, they are wrong with eight figures of revenue, which is not the worst place to be wrong.
Sitting on top of kforge is Gimlet Cloud, a serverless platform for AI agents. The pitch reads like a dare to the hyperscalers: deploy a single agent or a tangled multi-agent system, hand it some custom code and a few data sources, and let Gimlet handle scheduling, scaling and optimization. The customer never sees which chip ran which step. The customer also never sees the bill they would have paid otherwise.
The Money And The People Behind It
The seed round was twelve million dollars, closed quietly enough that most people learned about it from the stealth-exit press release in October 2025. The angel list is the part you forward to friends. Bill Coughran of Sequoia. Stanford's Nick McKeown. Former VMware CEO Raghu Raghuram. Intel's CEO Lip-Bu Tan. A roster like that is not a fundraising tactic. It is a signal that the systems crowd believes someone has, finally, found a respectable way to disagree with NVIDIA.
The Series A, announced in March 2026, was eighty million dollars, led by Menlo Ventures, with Eclipse Ventures, Prosperity7, Triatomic and Factory along for the round. That brings the total to ninety-two million. At twenty-six employees, the per-head valuation joke writes itself - and so does the per-head revenue joke. The company would prefer you skip both and read the partnership announcement with d-Matrix, which claims ten-fold speedups on frontier workloads when you put Gimlet's compiler and d-Matrix's silicon in the same room.
What This Actually Lets You Do
Here is the practical version. If you are a frontier AI lab, you can stop turning down chips because your software stack only speaks one dialect. If you are a smaller AI company, you can ship a multi-agent product to a serverless platform that takes care of the scheduling. If you are a chip startup, you can have your hardware supported on day one by writing nothing yourself. If you are an enterprise that has been watching inference budgets eat margin, you can probably get a quote back that does not require a small mortgage. None of this is glamorous in the way that a foundation model is glamorous. It is the kind of unglamorous that compounds.
The Mood In The Room
Back in the Potrero office, an engineer makes a joke about kernel quality scores. Someone else points at a graph nobody else can read and says something is now within five percent of hand-tuned. A model finishes its run, split across the three machines under the table, and exits cleanly. Nobody claps. Gimlet Labs does not have a culture of clapping. They have a culture of shipping the thing that makes the next thing possible. That is, in the end, what an applied research lab is supposed to do, and they appear to be doing it.