He kept inheriting models nobody could explain. So he built the place where the explaining lives.
Walk into most machine-learning teams and ask how last quarter's model got built - which data, which settings, what failed before the thing finally worked - and you'll get shrugs. Gideon Mendels heard those shrugs at Columbia, at Google, at a startup he co-founded, and decided the shrug itself was the product. Comet, the company he runs from 100 Sixth Avenue in Manhattan, is the institutional memory that machine learning never had.
Today Comet sits inside the workflows of ML teams at Ancestry, Cepsa, Etsy and Uber. It tracks the datasets, the code changes, the experiment history and the models, automatically, so that the person who inherits the project six months from now doesn't start from zero. The early elevator line was blunt: do for machine learning what GitHub did for code. Mendels has spent the better part of a decade making that sentence less of a boast and more of a description.
He is, by his own framing, three things stacked on top of each other - a computer scientist, an NLP researcher, and an entrepreneur. The combination is the point. Most founders in his category are one of those. The ones who last tend to be all three.
Before any of the AI, Mendels built GoMatkot.com into Israel's largest exporter of racquets and sports goods. It was acquired in 2017 - the same year he turned his attention fully to Comet.
In the Spoken Language Processing Group, he worked on the IARPA Babel program - speech recognition for low-resource languages, funded by US intelligence research. Hard problems, small data.
At Google he built models to detect hate speech in YouTube comments. One of the messier, more thankless NLP problems of its era - and a crash course in why model behavior needs auditing.
He co-founded GroupWize, where the team trained and deployed more than 50 NLP models across 15 languages, running over billions of chat interactions. The tracking problem got personal here.
Mendels started as a software engineer around 2009 and drifted into machine learning in grad school, mostly through language modeling. That drift is the whole story. He didn't arrive at AI through hype - he arrived through the specific grind of making speech recognition work for languages with barely any training data, and through the specific frustration of detecting hate speech at internet scale.
Each role left the same residue. A model would work, sort of, and then the next person - sometimes Mendels himself - would have to rebuild understanding from nothing. No record of what had been tried. No log of what failed. The experiment was the asset, and everyone kept throwing it away. In the early days, he has noted, data scientists were quite literally emailing each other Jupyter notebooks to share work.
Comet started as the fix for that. The first version was, in spirit, a black box recorder for model training: log every run, every parameter, every metric, automatically, so the history compounds instead of evaporating. GeekWire covered it in 2017 as a "GitHub-like management system for machine learning." The framing stuck because it was accurate.
The funding caught up fast. In April 2021 Comet raised a $13M Series A. Just six months later, in November, it closed a $50M Series B led by OpenView, with Scale Venture Partners, Trilogy Equity Partners and Two Sigma Ventures along for the ride. At the time the company reported 5x growth in annual recurring revenue and 150-plus customers. Redis founder Ofer Bengal and AI2's Oren Etzioni signed on as strategic advisors. Total funding reportedly reached $74.8M.
The experiment-tracking problem Mendels built Comet to solve was a machine-learning problem. Then large language models rewrote the job description, and the question changed from "how was this model trained" to "why did this agent just do that, and can I trust it in production."
In September 2024 Comet shipped Opik, a fully open-source platform to evaluate, test and monitor LLM applications - logging traces and spans, scoring outputs, comparing versions across RAG chatbots, code assistants and complex agentic systems. You can download it from GitHub and run it locally. Mendels' pitch is characteristically unromantic: the gap between an AI agent that demos well and one that works reliably in production is enormous, and it closes only with real observability and evaluation.
Mendels keeps a running video series on machine learning operations and the realities of running AI in production - including the unglamorous middle ground between a demo and a deployment.