Mohak Sharma

The Dispatch

A tool for watching machines think

Mohak Sharma runs HoneyHive, a New York company that does something most people never knew they needed until their AI started misbehaving: it watches.

Every company in 2025 wants an AI agent. Fewer of them know what their agent actually did at 3 a.m. on a Tuesday, three steps into a chain of reasoning, when it quietly decided to do the wrong thing. That gap — between the demo that dazzles and the deployment that breaks — is the entire reason HoneyHive exists, and the reason Sharma spends his days there.

HoneyHive is an observability and evaluation platform for AI agents. In plainer terms: it traces every step a multi-step AI pipeline takes, scores it against custom metrics, flags the failures, and lets engineers replay what went wrong. It is built natively on OpenTelemetry, the open standard engineers already trust for monitoring ordinary software, and extended to the much stranger problem of monitoring software that improvises.

Sharma describes the mission in deceptively small terms. The simplest problem the company solves, the founders say, is telling customers what their agents are doing and why. It sounds modest. It is not. As AI systems take more steps and make more decisions on their own, "why" becomes the hardest word in software.

The simplest problem that we solve for our customers is telling them what their agents are doing and why.

— The HoneyHive founding thesis

The market noticed. In April 2025 HoneyHive announced a $5.5M seed round led by Insight Partners, with George Mathew joining the board. Add an earlier $1.9M pre-seed led by Zero Prime Ventures and the company has raised $7.4M total. The angel list reads like a who's-who of data infrastructure: Jordan Tigani, CEO of MotherDuck, and Savin Goyal, CTO of Outerbounds, both put money in. Firms like 468 Capital, MVP Ventures, AIX Ventures, and Firestreak Ventures rounded out the cap table.

Why Insight, and why now? Because the category Sharma is building in barely existed three years ago. When HoneyHive launched in October 2022, "AI observability" was not a phrase any procurement department typed into a search bar. The first wave of AI tooling was about getting a model to respond at all. The second wave — the one Sharma bet on early — is about getting it to respond correctly, repeatedly, and provably, under the kind of scrutiny a regulated enterprise applies to anything that touches a customer.

The technical choice that signals how seriously HoneyHive takes that scrutiny is OpenTelemetry. Rather than inventing a proprietary format that locks customers in, the company built on the open standard that engineering teams already use to monitor conventional software. It is a quietly confident decision. It says: we are not a novelty bolted onto your stack, we are part of the plumbing. For an enterprise buyer deciding whether to trust an AI system with real money and real liability, plumbing is exactly the right metaphor.

Origin

It started with a roommate

Most founding stories get romanticized after the fact. This one is refreshingly literal: Sharma and his co-founder Dhruv Singh were roommates at Columbia University, and they talked about building a company together before either of them had any idea what it would be.

Then they went their separate ways to find out. Sharma studied operations research at Columbia — the math of decisions, optimization, and systems under uncertainty, which is a strangely perfect education for someone who now sells reliability to AI teams. He worked in strategy at Accenture, traded at Citi, and then landed at the document-automation company Templafy as a product manager building out its data and AI platforms.

Singh, meanwhile, became a software engineer on the OpenAI Innovation team inside Microsoft's Office of the CTO. He had a front-row seat to large language models back in 2021, when, as he later put it, almost no one outside Microsoft even knew what an LLM was. He kept running into the same wall: these systems behaved unpredictably, and nobody had a good way to evaluate them.

No one outside Microsoft knew about LLMs.

— Dhruv Singh, co-founder & CTO, on 2021

Two roommates, two halves of the same problem. Sharma had watched data and AI platforms get built; Singh had watched LLMs misbehave at the frontier. They both saw teams shipping promising AI prototypes that shattered in production, with no proper tools to understand the damage or fix it systematically. In October 2022 they launched HoneyHive to close that gap, framing it as a new DevOps stack for AI — especially for agentic, multi-step workflows where the failure modes are subtle and the stakes are high.

The Numbers

Small team, loud results

The traction came fast. Logged requests on the platform grew more than 50x over the course of 2024. Early customers included Fortune 100 enterprises in insurance and banking — the exact industries where an AI making a confident, wrong decision is not a funny anecdote but a regulatory event.

The customer outcomes are the part Sharma can point to without overselling. One customer improved its agent's web-browsing accuracy by 340%. Another compressed its development cycles fivefold. These are not vanity metrics; they are the difference between an AI feature that survives a board review and one that gets quietly killed.

Agent web-browsing accuracy gain+340%

Logged request growth, 202450x

Development cycle speed-up5x

What HoneyHive actually does for those teams is unglamorous and essential. It tracks every step of a pipeline, not just the final answer, so when something goes wrong on step seven of a twelve-step chain, an engineer can see it instead of guessing. It lets teams attach custom metrics, run continuous evaluations, and catch regressions before a customer does. It handles the firehose of data that AI systems produce, the logs and traces that pile up faster than any human could read them. It is, in the founders' framing, the difference between hoping an AI works and knowing it does.

And here is the part that makes other founders wince with envy: the whole thing ran on roughly eight people. The team operates at enterprise scale by using AI relentlessly across its own work. Singh's estimate is that each person does the work of five. The company that helps you trust AI agents is, fittingly, run by a tiny human crew leaning hard on AI of their own.

What's Next

The oversight problem

Sharma is not building a dashboard company. The longer arc he and Singh describe is the oversight problem: as agents reason in more steps and at higher speeds, human ability to monitor them simply runs out. Their answer is provocative — meta agents, AI systems whose job is to help build, evaluate, and improve other AI systems, with humans watching the watchers.

It is a big claim from a small company, which is exactly the kind of bet that defines this moment in software. The TAM they cite is $36.2B, and they suspect the real number is larger. Near term, the roadmap is concrete: more enterprise integrations, flexible deployment (SaaS, single-tenant, on-premises), and richer tools for simulating multi-turn agent behavior before it ever reaches a customer.

There is a neat symmetry in all of this. Sharma studied operations research, the discipline of making good decisions inside complex systems you cannot fully see. He now sells that same capability to companies whose AI systems have become exactly that — complex, partly opaque, and consequential. The student of decision-making under uncertainty grew up to sell instruments for it. HoneyHive is, in a sense, operations research pointed at the most uncertain machines we have ever shipped.

For now, Sharma's pitch stays grounded in a single, stubborn truth about the AI era. The agents are getting smarter. Knowing what they did, and why, is the only thing that lets anyone sleep. He built the thing that answers the question. The rest of the industry is still learning to ask it.

Mohak Sharma

A tool for watching machines think

It started with a roommate

Small team, loud results

The oversight problem

Four things worth knowing

Follow the trail