Exhibit A: the logo that follows your model around
The quiet infrastructure that keeps AI honest - from the first training run to the last production hallucination.
New York, NY • Founded 2017 • Makers of Opik
Somewhere tonight, a recommendation model is deciding what 40 million people see next. A fraud system is approving a payment. A chatbot is answering a customer who has no idea it is a chatbot. Each of these is a guess dressed up as a decision. Most companies cross their fingers. A growing number of them watch the whole thing on Comet instead.
Comet is an AI developer platform. That is a tidy phrase for a messy job: tracking every experiment, every metric, every model version, and - increasingly - every word a large language model says back to a user. It is the flight recorder for machine learning, and it has become indispensable precisely because the alternative is flying blind.
The company does not make headlines the way the model labs do. It is not announcing a new frontier model every quarter, and it is not promising artificial general intelligence by Tuesday. What it offers is less thrilling and far more useful: a record. When a model behaves strangely, Comet is the difference between a team that can answer "why" and a team that can only shrug. In an industry built on confident guesses, that is a surprisingly rare commodity.
"Comet pitches itself as doing for machine learning what GitHub did for code."
- TechCrunch, on the company's positioningFor years, building a model looked like this: a data scientist ran an experiment, got a promising result, tweaked a few numbers, ran it again, and then - somewhere around the fortieth attempt - could no longer remember which version actually worked. Results lived in notebooks emailed back and forth. Reproducing last month's breakthrough was less science than archaeology.
This was fine when models were toys. It became a liability when they started approving loans and steering cars. The industry had built systems it could not explain, could not reproduce, and frequently could not trust. Everyone agreed this was a problem. Almost no one wanted to do the unglamorous work of fixing it.
"Reproducibility isn't glamorous. So naturally, someone built a whole company around it."
- The recurring theme of the MLOps eraGideon Mendels and Nimrod Lahav had earned their scars. At their previous company, GroupWize, they trained and deployed more than 50 natural-language models across 15 languages, processing billions of chats. Before that, Mendels worked on hate-speech and deception detection at Columbia University and Google. They knew exactly how the sausage got made, and how often it got lost.
In 2017 they founded Comet on a contrarian bet: that the boring middle of machine learning - the tracking, the versioning, the monitoring - was where the real value sat. Not the flashy model. The plumbing around it. Investors took some convincing, and then took less convincing: the company raised a $13M Series A in April 2021, and a $50M Series B just six months later.
The speed of that second round told its own story. Investors do not double down in half a year on a hunch. They do it when the usage charts are unambiguous, and Comet's were. The thesis that nobody wanted to fund in 2017 - that machine learning needed an operations layer as serious as the models themselves - had become conventional wisdom. The term "MLOps" went from jargon to job title, and Comet had a head start.
"Before Comet, the founders deployed 50+ NLP models in 15 languages across billions of chats. They had earned the right to be annoyed."
- From the company's origin storyThe original Comet platform did one thing extremely well: it logged everything. Metrics, hyperparameters, code, datasets, and artifacts, all captured automatically so a team could compare two thousand runs without losing the one that mattered. Then it added a model registry and production monitoring, so the model that worked in the lab kept working in the wild.
When generative AI arrived, Comet did the obvious thing and the hard thing at once. The obvious thing: build for LLMs. The hard thing: open-source it. Opik, launched in 2024, traces an entire LLM pipeline - including tangled RAG and multi-agent setups - and runs evaluations to flag hallucinations and regressions in real time. It is free, it lives on GitHub, and developers actually use it.
Track, compare, explain, and reproduce ML experiments with automatic logging of everything that matters.
Version and govern models, then watch them in production for drift and performance decay.
Open-source tracing, evaluation, and monitoring for LLM apps, RAG systems, and agentic workflows.
Automated prompt tuning and real-time output validation so agents misbehave a lot less.
"Opik helps you understand what your LLM application is doing, measure how well it's working, and systematically make it better."
- Opik documentationTalk is abundant in AI. Adoption is not. Comet's case rests on the unsexy fact that a great many people rely on it. Over 150,000 developers use Opik. Roughly 450 enterprise, startup, and academic teams use Comet. The customer list reads like a who's-who of companies that cannot afford to ship a broken model: Uber, Etsy, Affirm, Zappos, Cepsa.
The funding follows the same logic. Around $70 million raised across seed, Series A, and Series B, from Scale Venture Partners, Two Sigma Ventures, Trilogy Equity Partners, OpenView, and the Amazon Alexa Fund. The partnerships fill in the rest of the picture: Comet sits on the AWS Marketplace, integrates with Amazon SageMaker and Bedrock, and plugs into the LangChain ecosystem that so many LLM teams now build on. None of this is glamorous. All of it is load-bearing.
"Trusted by Uber, Etsy, Affirm and Zappos - companies that find out fast when a model is wrong."
- From Comet's customer rosterIt is a plain sentence, and Comet means it plainly. The company is not trying to build the smartest model in the world. It is trying to make sure the models other people build can be trusted, repeated, and shipped. That distinction matters. The hard part of AI was never the demo. It was everything that happens after the demo goes to production and meets reality.
Comet's wager is that as AI moves from clever experiments to load-bearing infrastructure, the tools that measure and govern it become non-negotiable. Regulators are circling. Customers are skeptical. "It seemed to work in testing" is not a defense anyone wants to give in a deposition.
"The hard part of AI was never the demo. It was everything that happened after."
- The case for observability, in one lineThe next wave of AI does not just answer questions - it takes actions. Multi-step agents that call tools, query databases, and make decisions on a user's behalf. They are powerful, and they are exactly the kind of system that fails quietly and expensively. Comet's 2025 push into agent optimization and guardrails is a bet that the world will soon need a way to supervise software that supervises itself.
So return to that model running in production tonight, deciding what millions of people see, approving the payment, answering the customer. The difference between a company that sleeps soundly and one that does not is no longer the cleverness of the model. It is whether anyone can see what it is doing. Comet built the window. That turns out to be the whole game.