There is a number that keeps appearing in conversations with Zain Asgar: 15 to 30 percent. That is how much of the time the high-end GPUs powering most AI inference are actually doing useful work. The rest is idle. The machines are running, the power is burning, the bills are arriving - and the compute is sitting there. Multiply that waste across an industry spending hundreds of billions on AI infrastructure and you have a problem that sounds impossible but is, Asgar believes, entirely solvable. So he is solving it.
Gimlet Labs, the company Asgar co-founded and leads as CEO, describes itself as the first multi-silicon inference cloud. The pitch is elegant in the way that the best infrastructure ideas always are: instead of running every AI workload on the most expensive GPU available, route each component of the workload to the hardware it actually fits. Compute-bound tasks go to high-throughput GPUs. Memory-bound tasks move to higher-bandwidth accelerators. Network-bound stages land on fast interconnect nodes. The result, Asgar says, is inference that runs 3-10x faster at the same cost - or the same speed for a fraction of the price.
From Circuit Design to Google Lens
Asgar's path to this problem is unusually direct. He started his NVIDIA career as an undergraduate intern at the University of Minnesota, Twin Cities, and kept working there through his entire Stanford PhD in electrical engineering - a dissertation focused on GPU energy modeling and analysis. He did not emerge from academia into industry; he ran them simultaneously, building a mental model of the full hardware stack from circuit level to software that almost nobody else carries.
At Google AI, that full-stack thinking produced something unexpected. Asgar was tasked with getting AI to run on the constrained hardware of mobile devices - low power, limited memory, far from data centers. His project explored how to map computer vision workloads to chips that had not been designed for them. The output became Google Lens, the real-time image search product now used by hundreds of millions of people. He did not set out to build a consumer product. He set out to solve a hardware-software fit problem, and a consumer product was what that solution turned into.
Pixie Labs: The First Exit
In 2018, Asgar co-founded Pixie Labs with a thesis about Kubernetes observability. The insight was that as infrastructure moved to containerized, ephemeral environments, traditional monitoring approaches were breaking down. Pixie used eBPF kernel instrumentation to give developers instant, no-instrumentation visibility into running clusters - you deployed it once and your entire Kubernetes environment became transparent.
New Relic acquired Pixie Labs in December 2020. But the acquisition was not the end of Pixie's story. After the deal closed, Pixie was contributed to the Cloud Native Computing Foundation as an open source project under the Apache 2.0 license. The company Asgar built kept running after he left - a rare outcome that says something about how thoroughly the product worked and how deeply the community had adopted it.
The Problem With Running Everything on GPUs
After the New Relic acquisition, Asgar started watching the AI infrastructure landscape with fresh eyes. What he saw troubled him. The industry had converged on a single answer to every AI workload: NVIDIA GPUs. The chips were extraordinary, the software ecosystem was deep, the benchmarks were good. And for training large models, the choice made sense. But inference - actually running AI in production, serving requests to users and agents - was a different beast.
Inference workloads are not monolithic. A single query to an AI agent might touch prefill stages that are compute-bound, decode stages that are memory-bandwidth-bound, and routing logic that barely stresses any specialized hardware at all. Running all of it on a $40,000 GPU was like shipping every package in a city using long-haul semi trucks. The trucks work. They are just not the right tool for every leg of the journey.
Asgar founded Gimlet Labs to build the routing layer. The platform disaggregates AI workloads into their component stages and dispatches each to hardware it fits: NVIDIA H100s for the compute-heavy parts, AMD chips where memory bandwidth is the constraint, Cerebras wafer-scale accelerators for specific model architectures, d-Matrix Corsair chips for other workloads, Intel and ARM silicon in the mix. The system manages orchestration, optimization, and scheduling - developers deploy to Gimlet's cloud and the routing happens automatically.
You're wasting hundreds of billions of dollars because you're just leaving idle resources.
- Zain Asgar, Co-Founder & CEO, Gimlet Labskforge: AI Agents Writing GPU Code
The multi-silicon routing problem has a second layer. Even when you know which chip to use, getting peak performance out of it requires optimized low-level kernels - small programs that tell the hardware exactly how to move data and execute operations. Writing these by hand is one of the most specialized jobs in computer science. Writing them for six different hardware backends is nearly impossible to staff.
Gimlet Labs built kforge to solve this. The system uses AI agents to autonomously generate optimized kernels from PyTorch code, exploring hardware-specific designs across CUDA, ROCm, and Metal backends. The AI writes the code that makes the AI run faster. It is recursive in a way that feels almost inevitable in hindsight, though nobody built it until Asgar's team did.
The research behind kforge was published in early 2026, co-authored by Asgar. The paper sits alongside a broader academic work on efficient and scalable agentic AI systems - evidence that Gimlet Labs is not just shipping product but genuinely advancing the science of heterogeneous inference.
The $80M Series A and What It Signals
In March 2026, Menlo Ventures led an $80 million Series A for Gimlet Labs, bringing total funding to $92 million. The investor list is telling: Eclipse Ventures, Prosperity7, Triatomic, and Factory joined the round. Angel investors included Bill Coughran, a former Google engineering VP who built Google's infrastructure at scale, and Lip-Bu Tan, the CEO of Intel. When the CEO of one of the world's largest chip companies writes a personal check into your startup, it is a statement about which direction the hardware wind is blowing.
The customer list is equally pointed. Gimlet Labs publicly confirmed it added one of the top three frontier AI labs and one of the top three hyperscalers as customers after its October 2025 launch. More specifically, the company is helping OpenAI optimize models on Cerebras chips - the compute powering Codex-Spark. OpenAI using non-NVIDIA hardware at scale, routing through Gimlet's platform, is the proof-of-concept for everything Asgar has been arguing since he started the company.
Teaching While Building
Asgar holds an adjunct professorship in computer science at Stanford. Teaching while running a 30-person startup that just raised $80M is not the path of least resistance. But it tracks with a career where the academic and the practical have always run in parallel - the PhD student who stayed at NVIDIA, the engineer who kept publishing research papers while shipping Gimlet products. The boundary between building and learning never fully closed for him.
That depth shows up in how Asgar talks about the problem. He does not describe Gimlet Labs in terms of product features. He frames it as an architectural transition - the same kind of shift that happened when personal computers moved from single-tasking to operating systems, when data centers moved from bare metal to virtualization, when mobile made the browser behave like an app. Each of those transitions unlocked an order of magnitude more from the same hardware. He thinks AI inference is about to make the same leap.
"We're going to see software finally pivot away from a 'constrained to bare metal' world into full-blown operating systems," he said in a December 2025 podcast interview. "Just like in PCs, data centers, and mobile, that transition really allowed those applications to take on so much more." Gimlet Labs, in this framing, is not an optimization play. It is a bet on a structural change in how AI compute works - the same bet Asgar has been making, in different forms, since he first studied why GPUs use more energy than they need to.