Data & AI Sarah Catanzaro leads Amplify Partners' $700M Fund V Portfolio: RunwayML - LangChain - Modal Labs - Hex - DatologyAI - WarpStream Former US Secret Service threat intelligence architect GP at Amplify since 2017 - promoted to General Partner 2022 Mapped Somali pirate networks before she mapped the ML stack Speaker at PyTorch Conference 2025 Early investor in tools every serious AI team runs on Data & AI Sarah Catanzaro leads Amplify Partners' $700M Fund V Portfolio: RunwayML - LangChain - Modal Labs - Hex - DatologyAI - WarpStream Former US Secret Service threat intelligence architect GP at Amplify since 2017 - promoted to General Partner 2022 Mapped Somali pirate networks before she mapped the ML stack Speaker at PyTorch Conference 2025 Early investor in tools every serious AI team runs on
Sarah Catanzaro, General Partner at Amplify Partners
General Partner · Amplify Partners · Menlo Park, CA

Sarah
Catanzaro

Venture Capital • Data Infrastructure • AI & ML Tools

Before she was funding the AI infrastructure stack, she was mapping Somali pirate networks with incomplete data. The method never changed - only the problem set.

$700M Fund V
16+ Investments
4 Exits
2017 Joined Amplify
Data Infrastructure ML Tools Pre-Product Bets Technical Founders Stanford Enterprise AI
$700M Amplify Fund V
$1.25B LangChain Valuation
$410M RunwayML Raised
4 Portfolio Exits

She Read the Data Before Anyone Else Did

The year was 2005. A Stanford graduate named Sarah Catanzaro took a job that most people couldn't explain at dinner parties: she was applying computational linguistics and network analysis to track Somali pirate organizations and model insurgent behavior. The tools were rough. The data was incomplete. The stakes were non-negotiable.

Two decades later, she's a General Partner at Amplify Partners, running one of Silicon Valley's sharpest early-stage funds focused on data infrastructure and machine learning tooling. The tools are more sophisticated. The problems are less likely to involve actual pirates. The core work - making sense of complex systems from incomplete data - hasn't changed.

We don't invest in people who are just enamored with new technologies. We invest in people who want to solve problems and want to build products.

- Sarah Catanzaro

Catanzaro grew up in a home where experimentation was ambient - her father is a molecular biologist, her mother a psychiatrist and clinical researcher. Playing with liquid nitrogen in her father's lab wasn't unusual. Asking hard questions about human behavior wasn't either. When 9/11 happened during her college years at Stanford, a question crystallized that would shape the next decade: what motivates people toward violence?

That question sent her to the Center for Advanced Defense Studies, where she worked with MIT AI researchers and Carnegie Mellon computer scientists to answer it at scale. Then to Qinetiq, where she was deployed to the US Secret Service to build threat intelligence systems that used NLP to protect presidential candidates - including Barack Obama - during the 2012 election cycle.

From the Secret Service, she moved through Palantir (where she watched government agencies struggle with data integration in real time) and Cyveillance, eventually landing at Mattermark as Head of Data. Mattermark was trying to do what she'd always done: extract signal from messy, incomplete information about private companies. She ran the data team. She applied machine learning to startup intelligence. And she started noticing something.

The tools available to data teams were broken. Not in obvious ways - they existed, they mostly worked, they got the job done. But they weren't designed for how real teams actually worked. They didn't integrate with each other. They required enormous manual effort. They weren't built for practitioners; they were built for engineers with days to spare.

The Insight That Built a Career

Sarah Catanzaro's investment thesis didn't come from whitepapers. It came from spending years as the person trying to do the work - building data dictionaries, debugging integration points, and realizing that the tools she wished she had didn't exist yet.

That practitioner's eye is what got her in the door at Amplify Partners in 2017. She was introduced by Shivon Zilis, and the firm was looking for something specific: someone who had actually done the work. "Impossibly smart with no discernible ego" is how she was described internally. She joined as a Principal. By 2020, she was Partner. By 2022, General Partner.

Amplify's model is thesis-driven investing at the earliest stages - often pre-product, sometimes pre-revenue, occasionally pre-complete-team. Catanzaro's thesis sits at the intersection of where data is collected, stored, managed, analyzed, and modeled. That's the entire data and ML stack, essentially. And she has the practitioner credibility to evaluate it at a technical level that most VCs can't match.

She maintains hands-on data work specifically to preserve that conviction. Not as a performance of humility, but as a practical necessity. If she can't evaluate a technical claim herself, she doesn't trust the investment.

Without good data, there are no good models. But good data is really, really hard to get.

- Sarah Catanzaro, on leading DatologyAI's seed round

Her portfolio is a map of problems she identified before they became consensus. RunwayML - creative AI before generative AI was mainstream. Hex - collaborative analytics notebooks for modern data teams. Modal Labs - serverless compute built for ML workflows. DatologyAI - data curation for model training, which she seeded in February 2024 and described as addressing the root problem that most AI projects fail to acknowledge. LangChain - the orchestration layer for agent engineering, invested at a $1.25 billion valuation.

The exits tell the story just as clearly: Bayes acquired by Airtable, Einblick by Databricks, Eppo by Datadog, WarpStream by Confluent. These aren't flukes. They're companies that Catanzaro identified as solving real problems in the data stack, before the acquirers recognized they needed to own those solutions.

On the subject of MLOps - the operational layer that gets ML models from experiment to production - she has been consistently blunt. "It's still just as hard to get a model into production," she said in 2022, despite hundreds of new vendors claiming otherwise. "Tools are not well integrated with each other. Teams must cobble together all of these pieces." The market has since validated this assessment with aggressive M&A activity across the sector.

What Catanzaro looks for in founders is specific and unsentimental. She wants people who are obsessed with a problem, not with a technology. She wants founders who understand why they're building what they're building - not just the what, but the urgent specific problem that makes this startup necessary right now. She'll take pre-product. She won't take pre-conviction.

What it really takes to succeed in venture is stamina. Being always on, always aware, and comfortable with uncertainty.

- Sarah Catanzaro

She has been candid about what venture requires from women specifically. Female VCs must hustle harder without traditional networks, she's said - and she uses that harder path to emphasize the importance of role models. She's spoken at Strata Data Conference, apply(conf) by Tecton, and PyTorch Conference 2025. She launched a weekly newsletter in 2019 specifically because existing data newsletters were either too shallow or too academic - none built for practitioners who needed content they could use on a Tuesday morning before a sprint.

The year she had her first child, she made three new investments, closed four follow-on rounds, helped orchestrate a nine-figure exit, grew her team, and attended more conferences and dinners than she could count. "This isn't a humble brag," she wrote on X. "I'm straight up bragging."

The precision of that statement is very Sarah Catanzaro. She tracks what matters, reports it accurately, and doesn't dress it up. The data is the story.

Bets on the Boring, Critical Stack

The companies that every serious AI team ends up running on. She found them first.

RunwayML
Creative AI • $410M raised
LangChain
Agent engineering • $1.25B
Modal Labs
Serverless ML compute • $110M
Hex
Collaborative analytics
DatologyAI
AI data curation • seed led
MotherDuck
Serverless analytics • $100M
David AI
AI infrastructure • $80M
CedarDB
Next-gen database
Datafold
Data reliability
Eppo
Experimentation
Acq. Datadog
WarpStream
Kafka-compatible streaming
Acq. Confluent
Einblick
AI analytics
Acq. Databricks

Opinions on the Record

"I don't spend most of my time talking to startups or hearing pitches. I spend most of my time talking to data and ML practitioners to better understand their needs - to really zero in on where there's white space in the market."

- On her sourcing process

"Model performance hinges on rigorous data curation. Now, post-training researchers are learning the same lesson - but they don't have access to high quality annotations."

- On AI's foundational problem

"People see GPT outputs and overlook critical constraints like data access and talent availability. The gap between demo and deployment is where most companies actually fail."

- On AI hype cycles

"Data represents the world and human behavior. That makes it powerful and dangerous in equal measure - particularly for marginalized populations."

- On data ethics

"Even fastest-growing startups prove emotionally taxing and physically exhausting. Founders need to know what they're signing up for - not the highlight reel version."

- On founder expectations

"Three new investments, four follow-on rounds, one 9-figure exit, a growing team, and LOTS of conferences. The year I gave birth to my first kid. I'm straight up bragging."

- X / Twitter, January 2025

From Pirates to Platform Companies

Stanford University
B.S. degree. 9/11 happens and a question forms: what motivates people toward violence? The question shapes the next decade.
C4ADS - Center for Advanced Defense Studies
Program Director. Applied computational linguistics (with MIT AI researchers) and network analysis (with CMU computer scientists) to counter-terrorism. Mapped Somali pirate organizations.
Qinetiq / US Secret Service
Defense contractor deployment to the Secret Service. Built NLP-powered threat intelligence systems to protect presidential candidates including Barack Obama during the 2012 election.
Cyveillance
Cyber intelligence analyst. Continued specialized threat intelligence work in the private sector.
Palantir Technologies
Embedded analyst. Watched government agencies struggle with data integration at scale - a front-row seat to the gap between data tooling and real-world needs.
Mattermark - Head of Data (2014-2016)
Built the data team from scratch. Applied machine learning to private company intelligence. Started recognizing the fundamental brokenness of existing data tooling.
Canvas Ventures - Data Partner (2016)
First venture experience. Sourced multiple strong investments. Realized VC was a natural trajectory for domain experts.
Amplify Partners - Principal (2017)
Joined as the firm's first "practitioner turned investor." Introduced by Shivon Zilis. Focused on data infrastructure and ML tooling from day one.
Amplify Partners - Partner (2020)
Promoted from Principal. Portfolio growing across data infrastructure, MLOps, and developer tools.
Amplify Partners - General Partner (2022-present)
Leading Fund V ($700M). Active board member and GP. Three investments + nine-figure exit in the year she had her first child.

What Founders Say

"Sarah knows her stuff and always goes the extra mile. She's not just showing up to board meetings - she's in the weeds with you."

Barry McCardel - CEO, Hex

"Her technical depth, integrity, and network sets her apart. She can evaluate a distributed systems architecture and then help you close your first enterprise customer."

Erik Bernhardsson - CEO, Modal Labs

Sarah Catanzaro on Camera

Beyond the Fund

🧬

Grew up visiting her molecular biologist father's lab, where playing with liquid nitrogen was considered normal Saturday activity. Origin story for a lifelong comfort with experimentation.

🏴‍☠️

Before machine learning was a mainstream career path, she was building network analysis packages to map Somali pirate organizations. The data was incomplete. The pirates were real.

🏨

Maintains a personal curated database of 300+ hotels and resorts. For a person whose professional life involves evaluating information systems, this tracks.

🐾

Coaches her Cavapoo. The dog has not yet raised a round, but given the portfolio, it's only a matter of time.

⛷️

Has a documented weakness for ski destinations - likely the only GP whose deal flow and travel calendar are both optimized around altitude.

📰

Founded a weekly data newsletter in 2019 because the gap between soundbite content and academic research was too wide. Built for practitioners. No fluff.

How She Thinks About the Bet

The Problem

Founders must be obsessed with a specific, urgent problem - not infatuated with technology. She asks: what breaks if this company doesn't exist? If the answer isn't obvious, neither is the investment.

The Stack

Tools that enable collecting, storing, managing, analyzing, and modeling data more effectively. She invests across the full ML stack - not just in model layers that get the press, but in the infrastructure underneath.

The Stage

Pre-product. Sometimes pre-revenue. Sweet spot at $5M checks. She evaluates technical architectures herself - hands-on data work maintained specifically to preserve conviction. No credentialing required to get a meeting.

Share This Profile