Comet

The Dispatch

A model is running in production right now. Comet knows what it's doing.

Somewhere tonight, a recommendation model is deciding what 40 million people see next. A fraud system is approving a payment. A chatbot is answering a customer who has no idea it is a chatbot. Each of these is a guess dressed up as a decision. Most companies cross their fingers. A growing number of them watch the whole thing on Comet instead.

Comet is an AI developer platform. That is a tidy phrase for a messy job: tracking every experiment, every metric, every model version, and - increasingly - every word a large language model says back to a user. It is the flight recorder for machine learning, and it has become indispensable precisely because the alternative is flying blind.

The company does not make headlines the way the model labs do. It is not announcing a new frontier model every quarter, and it is not promising artificial general intelligence by Tuesday. What it offers is less thrilling and far more useful: a record. When a model behaves strangely, Comet is the difference between a team that can answer "why" and a team that can only shrug. In an industry built on confident guesses, that is a surprisingly rare commodity.

"Comet pitches itself as doing for machine learning what GitHub did for code."

- TechCrunch, on the company's positioning

The Problem

Machine learning had a memory problem.

For years, building a model looked like this: a data scientist ran an experiment, got a promising result, tweaked a few numbers, ran it again, and then - somewhere around the fortieth attempt - could no longer remember which version actually worked. Results lived in notebooks emailed back and forth. Reproducing last month's breakthrough was less science than archaeology.

This was fine when models were toys. It became a liability when they started approving loans and steering cars. The industry had built systems it could not explain, could not reproduce, and frequently could not trust. Everyone agreed this was a problem. Almost no one wanted to do the unglamorous work of fixing it.

Above: the natural habitat of the lost model - a Jupyter notebook, last opened "for two seconds."

"Reproducibility isn't glamorous. So naturally, someone built a whole company around it."

- The recurring theme of the MLOps era

The Bet

Two NLP engineers who'd felt the pain firsthand.

Gideon Mendels and Nimrod Lahav had earned their scars. At their previous company, GroupWize, they trained and deployed more than 50 natural-language models across 15 languages, processing billions of chats. Before that, Mendels worked on hate-speech and deception detection at Columbia University and Google. They knew exactly how the sausage got made, and how often it got lost.

In 2017 they founded Comet on a contrarian bet: that the boring middle of machine learning - the tracking, the versioning, the monitoring - was where the real value sat. Not the flashy model. The plumbing around it. Investors took some convincing, and then took less convincing: the company raised a $13M Series A in April 2021, and a $50M Series B just six months later.

The speed of that second round told its own story. Investors do not double down in half a year on a hunch. They do it when the usage charts are unambiguous, and Comet's were. The thesis that nobody wanted to fund in 2017 - that machine learning needed an operations layer as serious as the models themselves - had become conventional wisdom. The term "MLOps" went from jargon to job title, and Comet had a head start.

"Before Comet, the founders deployed 50+ NLP models in 15 languages across billions of chats. They had earned the right to be annoyed."

- From the company's origin story

The Comet Trajectory

A short flight log

2017

LiftoffMendels and Lahav found Comet in New York with a pitch: be GitHub for machine learning.

2021

$13M Series AApril. Scale Venture Partners, Two Sigma Ventures, Trilogy, and the Amazon Alexa Fund back the MLOps standard.

2021

$50M Series BNovember. OpenView leads, six months after the A. The plumbing thesis pays off.

2023

State of the fieldComet publishes its MLOps Industry Report, putting numbers to how teams really build models.

2024

Opik arrivesComet open-sources Opik, an end-to-end platform to trace, evaluate, and monitor LLM applications.

2025

Agents get a babysitterOpik Agent Optimizer and Guardrails ship, plus integrations with LangChainJS, Autogen, and Agno.

The Product

From experiment tracking to watching agents think.

The original Comet platform did one thing extremely well: it logged everything. Metrics, hyperparameters, code, datasets, and artifacts, all captured automatically so a team could compare two thousand runs without losing the one that mattered. Then it added a model registry and production monitoring, so the model that worked in the lab kept working in the wild.

When generative AI arrived, Comet did the obvious thing and the hard thing at once. The obvious thing: build for LLMs. The hard thing: open-source it. Opik, launched in 2024, traces an entire LLM pipeline - including tangled RAG and multi-agent setups - and runs evaluations to flag hallucinations and regressions in real time. It is free, it lives on GitHub, and developers actually use it.

Experiment Management

Track, compare, explain, and reproduce ML experiments with automatic logging of everything that matters.

Model Registry & Monitoring

Version and govern models, then watch them in production for drift and performance decay.

Opik

Open-source tracing, evaluation, and monitoring for LLM apps, RAG systems, and agentic workflows.

Agent Optimizer & Guardrails

Automated prompt tuning and real-time output validation so agents misbehave a lot less.

Four products, one obsession: knowing what the model did and why.

"Opik helps you understand what your LLM application is doing, measure how well it's working, and systematically make it better."

- Opik documentation

The Proof

The numbers behind the confidence.

Talk is abundant in AI. Adoption is not. Comet's case rests on the unsexy fact that a great many people rely on it. Over 150,000 developers use Opik. Roughly 450 enterprise, startup, and academic teams use Comet. The customer list reads like a who's-who of companies that cannot afford to ship a broken model: Uber, Etsy, Affirm, Zappos, Cepsa.

The funding follows the same logic. Around $70 million raised across seed, Series A, and Series B, from Scale Venture Partners, Two Sigma Ventures, Trilogy Equity Partners, OpenView, and the Amazon Alexa Fund. The partnerships fill in the rest of the picture: Comet sits on the AWS Marketplace, integrates with Amazon SageMaker and Bedrock, and plugs into the LangChain ecosystem that so many LLM teams now build on. None of this is glamorous. All of it is load-bearing.

Funding, round by round

USD raised • cumulative ~$70M

Seed

~$6.8M

Series A '21

$13M

Series B '21

$50M

Sources: TechCrunch, SiliconANGLE, Crunchbase. Seed figure approximate.

150K+Opik developers

~450teams on Comet

14countries

2017founded

"Trusted by Uber, Etsy, Affirm and Zappos - companies that find out fast when a model is wrong."

- From Comet's customer roster

The Mission

Empower developers and teams to achieve business value with AI.

It is a plain sentence, and Comet means it plainly. The company is not trying to build the smartest model in the world. It is trying to make sure the models other people build can be trusted, repeated, and shipped. That distinction matters. The hard part of AI was never the demo. It was everything that happens after the demo goes to production and meets reality.

Comet's wager is that as AI moves from clever experiments to load-bearing infrastructure, the tools that measure and govern it become non-negotiable. Regulators are circling. Customers are skeptical. "It seemed to work in testing" is not a defense anyone wants to give in a deposition.

"The hard part of AI was never the demo. It was everything that happened after."

- The case for observability, in one line

Why It Matters Tomorrow

Agents are coming. Someone has to watch them.

The next wave of AI does not just answer questions - it takes actions. Multi-step agents that call tools, query databases, and make decisions on a user's behalf. They are powerful, and they are exactly the kind of system that fails quietly and expensively. Comet's 2025 push into agent optimization and guardrails is a bet that the world will soon need a way to supervise software that supervises itself.

So return to that model running in production tonight, deciding what millions of people see, approving the payment, answering the customer. The difference between a company that sleeps soundly and one that does not is no longer the cleverness of the model. It is whether anyone can see what it is doing. Comet built the window. That turns out to be the whole game.

Find Comet

Website LinkedIn Twitter / X GitHub Opik on GitHub Opik Docs Facebook Blog

WATCH ▶Opik product demo LISTEN ▶Gideon Mendels on LLM evaluation