The Story
The Data Whisperer
Before Ari Morcos decided what AI models should eat, he spent years studying what mouse brains do when they're deciding. His Harvard PhD research - tracking populations of neurons in parietal cortex as mice accumulated evidence for decisions - turned out to be better preparation for building AI companies than any MBA ever could be. Both are questions of signal versus noise. Both are questions of what gets learned, and why.
In late 2023, Morcos left Meta's FAIR lab - one of the most prestigious AI research positions on earth - to co-found DatologyAI. The premise is disarmingly simple: the thing nobody was building was the thing that mattered most. Not another foundation model. Not another GPU cluster. The question of what data those models train on.
"Models are what they eat," he says. It's become something of a mantra - quoted in TechCrunch, repeated on podcasts, cited in funding pitches. But behind the catchphrase sits a decade of hard-won research: two years at DeepMind in London, five years at Meta AI, two Outstanding Paper awards from the top ML venues on the planet, and an h-index of 35 at an age when most researchers are still working through their second postdoc.
Models are what they eat - models are a reflection of the data on which they're trained. Not all data are created equal, and some training data are vastly more useful than others.
- Ari Morcos, CEO of DatologyAI, in TechCrunch (2024)Origin Story
A Conference, a Conversation, a Career
It starts at NIPS 2015 - now NeurIPS, but back then still held in Montreal with enough intimacy that a grad student could walk into a hallway conversation with DeepMind researchers and end up with a job offer. That's exactly what happened to Morcos. He was finishing his PhD, studying how mouse brains think, and someone from DeepMind's London office asked the right question at the right moment.
London. DeepMind. The years 2016 to 2018. Those were the years when reinforcement learning was eating the world - AlphaGo had just beaten Lee Sedol, and the lab was the centre of gravity for AI ambition globally. Morcos was there for the deep learning generalization work, the representation comparison methods, the early inquiries into what neural networks actually learn versus what we assume they learn.
Then Meta came calling. September 2018, Menlo Park. He joined FAIR - Facebook AI Research - as a research scientist, and over five years rose to Senior Staff. The work there produced some of his most-cited research: Model Soups (averaging fine-tuned model weights to boost accuracy without increasing inference cost, now with 1,786 citations), the lottery ticket hypothesis extensions, data pruning methods. His 2022 NeurIPS Outstanding Paper on beating power-law scaling through data pruning planted a seed that would become DatologyAI.
The insight wasn't subtle. If you could pick better data - smarter data, more efficiently - you could train better models without throwing more compute at the problem. In an industry obsessed with parameter counts and GPU hours, this was almost contrarian. It was also almost certainly right.
Over a decade of ML research has made Morcos one of the most-cited active researchers in the field. His work spans neural network generalization, data efficiency, lottery tickets, self-supervised learning, and representation theory - all converging on the question: what makes a model actually learn?
Mentored researchers who are now at Google Brain, Anthropic, Apple, MosaicML (acquired by Databricks for $1.3B), and Harvard.
The Company
What DatologyAI Is Actually Building
Here is the problem DatologyAI is solving: every major AI company has accumulated vast stores of data - petabytes of text, images, video, audio, genomic sequences, geospatial records. The bottleneck isn't collecting more. It's knowing which of it is worth training on. That work has historically been done by hand, by teams of researchers running expensive ablation experiments, trying to figure out what combinations of data make models actually work.
DatologyAI automates that process. Morcos and his co-founders - Matthew Leavitt and Bogdan Gaza, formerly an engineering lead at Amazon and Twitter - built a platform that identifies which data is high-value, which is redundant, and which is actively harmful to model quality. It handles petabytes. It works on-premises or in private cloud. It covers text, images, video, audio, tabular data, genomic data, and geospatial formats.
The results, when they work, are striking: equivalent model performance at 1/10 the cost. Or equivalent cost with substantially better performance. In a world where training a frontier model can cost hundreds of millions of dollars, even a 30% improvement in data efficiency represents enormous value.
"Identifying the right data at scale is extremely challenging and a frontier research problem. Our approach leads to models that train dramatically faster while simultaneously increasing performance on downstream tasks."
The Cap Table is a Who's Who of AI
When your seed round includes three of the most influential figures in machine learning history, something is working. DatologyAI's $57.65M in funding came from:
Achievements
The Record
The Philosophy
The Bitter Lesson, Applied
Rich Sutton's "Bitter Lesson" - the AI community's bracing reckoning from 2019 - argued that approaches leveraging scale and computation consistently beat hand-designed systems. Morcos absorbed that lesson, and then extended it: if compute wins over clever engineering, then the question becomes what you do with your compute budget. And that's a data question.
"Data has a really nice advantage," he's said. "Because if you understand what's good or bad about data, it's actually quite easy to make an improvement based on that understanding." The key word is understanding. Not intuition, not convention - understanding. This is where his neuroscience background resurfaces. Studying brains taught him that mechanism matters. You don't improve something you can't explain.
His advice to researchers considering leaving academia for industry is characteristically direct: go straight to industry because you have so much more opportunity and resources, and it's where the hardest stuff is typically happening. He practices what he preaches. Left the best research job outside academia to go build the company.
His dogs are named Maui and Loki. He and his wife Julia travel to places that are hard to get to. He reads about the World Wars and the Cold War - histories of systems under extreme pressure, where resource constraints force creative solutions. There may be no cleaner metaphor for what DatologyAI is doing in AI.
Career Timeline
From Mouse Brains to Machine Learning
Data has a really nice advantage - if you understand what's good or bad about data, it's actually quite easy to make an improvement based on that understanding.
- Ari Morcos, on the data-centric approach to AI (Imbue Podcast, 2024)Analysis
The Researcher-Founder Archetype
There is a particular kind of AI founder emerging in the 2020s - the researcher-turned-entrepreneur. Not the person who read the papers, but the person who wrote them. Morcos is one of a handful who made that transition from the absolute top tier of industrial research (Meta FAIR is perhaps the closest thing to Bell Labs that exists in AI today) to building a company around a research insight, rather than a market opportunity.
The distinction matters. Most AI startups start with a use case and work backward. DatologyAI started with a research result - data pruning beats scaling - and worked forward. That inversion gives the company an unusual depth of technical credibility. Customers like Thomson Reuters and Arcee AI don't hire DatologyAI because of a sales deck. They hire it because the papers are cited by the people they trust.
"It was like expanding my research team with 30 world-class researchers," Lucas Atkins from Arcee AI said of working with DatologyAI. That's not a testimonial about software. That's a testimonial about trust in deep expertise.
What Morcos is building, underneath the company, is something harder to copy than a product: a scientific reputation converted into commercial credibility. The 12,000 citations aren't decorative. They're the moat.
Connect & Explore