Ari Morcos

The Story

The Data Whisperer

Before Ari Morcos decided what AI models should eat, he spent years studying what mouse brains do when they're deciding. His Harvard PhD research - tracking populations of neurons in parietal cortex as mice accumulated evidence for decisions - turned out to be better preparation for building AI companies than any MBA ever could be. Both are questions of signal versus noise. Both are questions of what gets learned, and why.

In late 2023, Morcos left Meta's FAIR lab - one of the most prestigious AI research positions on earth - to co-found DatologyAI. The premise is disarmingly simple: the thing nobody was building was the thing that mattered most. Not another foundation model. Not another GPU cluster. The question of what data those models train on.

"Models are what they eat," he says. It's become something of a mantra - quoted in TechCrunch, repeated on podcasts, cited in funding pitches. But behind the catchphrase sits a decade of hard-won research: two years at DeepMind in London, five years at Meta AI, two Outstanding Paper awards from the top ML venues on the planet, and an h-index of 35 at an age when most researchers are still working through their second postdoc.

Models are what they eat - models are a reflection of the data on which they're trained. Not all data are created equal, and some training data are vastly more useful than others.

- Ari Morcos, CEO of DatologyAI, in TechCrunch (2024)

$57.65M Total Funding Raised

10x Faster Model Training Achievable

20+ Researchers Mentored

3 Top-Venue Outstanding Papers

Origin Story

A Conference, a Conversation, a Career

It starts at NIPS 2015 - now NeurIPS, but back then still held in Montreal with enough intimacy that a grad student could walk into a hallway conversation with DeepMind researchers and end up with a job offer. That's exactly what happened to Morcos. He was finishing his PhD, studying how mouse brains think, and someone from DeepMind's London office asked the right question at the right moment.

London. DeepMind. The years 2016 to 2018. Those were the years when reinforcement learning was eating the world - AlphaGo had just beaten Lee Sedol, and the lab was the centre of gravity for AI ambition globally. Morcos was there for the deep learning generalization work, the representation comparison methods, the early inquiries into what neural networks actually learn versus what we assume they learn.

Then Meta came calling. September 2018, Menlo Park. He joined FAIR - Facebook AI Research - as a research scientist, and over five years rose to Senior Staff. The work there produced some of his most-cited research: Model Soups (averaging fine-tuned model weights to boost accuracy without increasing inference cost, now with 1,786 citations), the lottery ticket hypothesis extensions, data pruning methods. His 2022 NeurIPS Outstanding Paper on beating power-law scaling through data pruning planted a seed that would become DatologyAI.

The insight wasn't subtle. If you could pick better data - smarter data, more efficiently - you could train better models without throwing more compute at the problem. In an industry obsessed with parameter counts and GPU hours, this was almost contrarian. It was also almost certainly right.

Research Impact // Google Scholar

Over a decade of ML research has made Morcos one of the most-cited active researchers in the field. His work spans neural network generalization, data efficiency, lottery tickets, self-supervised learning, and representation theory - all converging on the question: what makes a model actually learn?

Mentored researchers who are now at Google Brain, Anthropic, Apple, MosaicML (acquired by Databricks for $1.3B), and Harvard.

12,343 Total citations on Google Scholar

35 H-index (33 since 2021)

1,786 Citations for Model Soups paper alone

20+ Mentees across top AI labs

The Company

What DatologyAI Is Actually Building

Here is the problem DatologyAI is solving: every major AI company has accumulated vast stores of data - petabytes of text, images, video, audio, genomic sequences, geospatial records. The bottleneck isn't collecting more. It's knowing which of it is worth training on. That work has historically been done by hand, by teams of researchers running expensive ablation experiments, trying to figure out what combinations of data make models actually work.

DatologyAI automates that process. Morcos and his co-founders - Matthew Leavitt and Bogdan Gaza, formerly an engineering lead at Amazon and Twitter - built a platform that identifies which data is high-value, which is redundant, and which is actively harmful to model quality. It handles petabytes. It works on-premises or in private cloud. It covers text, images, video, audio, tabular data, genomic data, and geospatial formats.

The results, when they work, are striking: equivalent model performance at 1/10 the cost. Or equivalent cost with substantially better performance. In a world where training a frontier model can cost hundreds of millions of dollars, even a 30% improvement in data efficiency represents enormous value.

"Identifying the right data at scale is extremely challenging and a frontier research problem. Our approach leads to models that train dramatically faster while simultaneously increasing performance on downstream tasks."

The Cap Table is a Who's Who of AI

When your seed round includes three of the most influential figures in machine learning history, something is working. DatologyAI's $57.65M in funding came from:

Felicis Ventures (Series A Lead) Amplify Partners (Seed Lead) Radical Ventures Elad Gil M12 (Microsoft Ventures) Amazon Alexa Fund Geoffrey Hinton Yann LeCun Jeff Dean Conviction Capital Outset Capital Quiet Capital Adam D'Angelo Aidan Gomez (Cohere) Jascha Sohl-Dickstein

Achievements

The Record

🏆

NeurIPS 2022 Outstanding Paper

"Beyond neural scaling laws: beating power law scaling via data pruning" - the research seed that became DatologyAI

🥇

ICLR 2023 Outstanding Paper

"Emergence of maps in blind navigation agents" - among the top 0.5% of submitted work

📄

ICLR Workshop Best Paper 2023

SemDeDup - semantic deduplication for large-scale training datasets, at the Multimodal Representation Learning Workshop

🎓

20+ Researchers Mentored

Mentees now at Anthropic, Google Brain, Apple, Harvard, and MosaicML (acquired by Databricks for $1.3B). Jonathan Frankle among them.

The Philosophy

The Bitter Lesson, Applied

Rich Sutton's "Bitter Lesson" - the AI community's bracing reckoning from 2019 - argued that approaches leveraging scale and computation consistently beat hand-designed systems. Morcos absorbed that lesson, and then extended it: if compute wins over clever engineering, then the question becomes what you do with your compute budget. And that's a data question.

"Data has a really nice advantage," he's said. "Because if you understand what's good or bad about data, it's actually quite easy to make an improvement based on that understanding." The key word is understanding. Not intuition, not convention - understanding. This is where his neuroscience background resurfaces. Studying brains taught him that mechanism matters. You don't improve something you can't explain.

His advice to researchers considering leaving academia for industry is characteristically direct: go straight to industry because you have so much more opportunity and resources, and it's where the hardest stuff is typically happening. He practices what he preaches. Left the best research job outside academia to go build the company.

His dogs are named Maui and Loki. He and his wife Julia travel to places that are hard to get to. He reads about the World Wars and the Cold War - histories of systems under extreme pressure, where resource constraints force creative solutions. There may be no cleaner metaphor for what DatologyAI is doing in AI.

2015 Year a hallway chat at NIPS changed his career trajectory

5 yrs At Meta FAIR before leaving to found DatologyAI

$1.3B Databricks acquisition of MosaicML, co-founded by his mentee Jonathan Frankle

Career Timeline

From Mouse Brains to Machine Learning

2008-2012

Undergraduate at UC San Diego, studying Physiology and Neuroscience. Worked with Rusty Gage at the Salk Institute on adult neurogenesis.

2012-2018

PhD in Neurobiology at Harvard University, advised by Chris Harvey. Focused on how parietal cortex accumulates evidence for decisions - using population dynamics methods borrowed from ML applied to neuroscience data.

2015

Met DeepMind researchers at NIPS in Montreal. Hired as Research Scientist; moved to London.

2016-2018

Research Scientist at DeepMind, London. Worked on neural network generalization, representation comparison, and abstraction measurement. Part of the team that published breakthrough reinforcement learning work including human-level performance in 3D multiplayer games (1,238 citations).

September 2018

Joined Facebook AI Research (FAIR) in Menlo Park as Research Scientist.

2018-2023

Senior Staff Research Scientist at Meta FAIR. Produced Model Soups, lottery ticket extensions, data pruning methods, and SemDeDup. Co-organized workshops at ICML and other venues. Mentored 20+ researchers.

November 2022

Outstanding Paper award at NeurIPS 2022 for data pruning research. The paper that planted the seed for DatologyAI.

March 2023

Outstanding Paper award at ICLR 2023 for navigation agent research. Two Outstanding Papers at top venues in under six months.

Late 2023

Co-founded DatologyAI with Matthew Leavitt and Bogdan Gaza.

February 2024

DatologyAI announces $11.65M seed round led by Amplify Partners, with angels including Geoffrey Hinton, Yann LeCun, Jeff Dean, Jascha Sohl-Dickstein, and Aidan Gomez.

May 2024

DatologyAI closes $46M Series A led by Felicis Ventures. Total funding: $57.65M. Company headcount: 52 employees.

Data has a really nice advantage - if you understand what's good or bad about data, it's actually quite easy to make an improvement based on that understanding.

- Ari Morcos, on the data-centric approach to AI (Imbue Podcast, 2024)

Analysis

The Researcher-Founder Archetype

There is a particular kind of AI founder emerging in the 2020s - the researcher-turned-entrepreneur. Not the person who read the papers, but the person who wrote them. Morcos is one of a handful who made that transition from the absolute top tier of industrial research (Meta FAIR is perhaps the closest thing to Bell Labs that exists in AI today) to building a company around a research insight, rather than a market opportunity.

The distinction matters. Most AI startups start with a use case and work backward. DatologyAI started with a research result - data pruning beats scaling - and worked forward. That inversion gives the company an unusual depth of technical credibility. Customers like Thomson Reuters and Arcee AI don't hire DatologyAI because of a sales deck. They hire it because the papers are cited by the people they trust.

"It was like expanding my research team with 30 world-class researchers," Lucas Atkins from Arcee AI said of working with DatologyAI. That's not a testimonial about software. That's a testimonial about trust in deep expertise.

What Morcos is building, underneath the company, is something harder to copy than a product: a scientific reputation converted into commercial credibility. The 12,000 citations aren't decorative. They're the moat.

Connect & Explore