BREAKING
Brian Raymond raises $40M Series B for Unstructured - NVIDIA, Databricks, IBM back the bet  •  Former CIA officer turned AI founder closes $65M total funding in under two years  •  Unstructured named 2024 IA40 winner in AI Infrastructure  •  800,000+ downloads, 2,500+ GitHub projects - the open source flywheel is spinning  •  Raymond at Databricks Data + AI Summit 2024: the 'first mile of AI' problem is real  •  Brian Raymond raises $40M Series B for Unstructured - NVIDIA, Databricks, IBM back the bet  •  Former CIA officer turned AI founder closes $65M total funding in under two years  •  Unstructured named 2024 IA40 winner in AI Infrastructure  •  800,000+ downloads, 2,500+ GitHub projects - the open source flywheel is spinning  •  Raymond at Databricks Data + AI Summit 2024: the 'first mile of AI' problem is real  • 
Brian Raymond, Founder and CEO of Unstructured
Founder / CEO - Unstructured

Brian
Raymond

AI Infrastructure  ·  San Francisco, CA  ·  $65M Raised

"The easy button for LLM data preparation" - from a man who once briefed the President on classified intelligence.

Founder AI Infrastructure Ex-CIA Series B Open Source
$65M Total Raised
800K+ Downloads
2,500+ GitHub Projects
120 Team Members

The Spy Who Solved AI's Dirtiest Problem

The "first mile" of enterprise AI is unglamorous, underestimated, and worth everything.

Before the PowerPoint slides and the term sheets, there was a problem. Not the kind that shows up in TED talks - the kind that shows up at 11pm when a data engineer at a Fortune 100 company is staring at a wall of unreadable PDFs and wondering why the AI demo stopped working in production.

Brian Raymond spotted that problem from an unusual perch. At Primer AI, where he led the national security group building AI tools for US government clients, he watched the same bottleneck appear again and again: the models were fine, the compute was available, but the raw files - PowerPoints, scanned contracts, audio recordings, Word docs - sat there stubbornly, refusing to become the clean JSON that language models could actually use.

In 2022, he left to fix it. With two former Primer colleagues - Matt Robinson, a PhD data scientist with an intelligence community background, and Crag Wolfe, a seasoned infrastructure architect - Raymond founded Unstructured. The company builds what its name says: the tooling to turn unstructured data into LLM-ready formats.

"If you go to a large investment bank or Walmart, the data exists in PowerPoints, Google docs, Slack messages, and audio recordings, with nothing to help data scientists get that into JSON to feed to models."
- Brian Raymond, Founder & CEO, Unstructured

The insight was straightforward enough to articulate but hard enough to solve that no one had quite done it. Most AI infrastructure conversation in 2022 centered on models - bigger, faster, cheaper. Raymond's bet was that none of it would reach production without a working pipeline from raw enterprise data to the model's input. He called it the "first mile of AI."

Two years in, the bet is looking sharp. Unstructured raised $65 million across three rounds - a $5 million seed from Bain Capital Ventures, a $25 million combined seed and Series A led by Madrona in July 2023, and a $40 million Series B closed in March 2024 with Menlo Ventures leading and Databricks Ventures, IBM Ventures, and NVIDIA's NVentures all participating.

Unstructured supports 25+ file types - PDFs, PowerPoints, images, audio, scanned documents - and delivers standardized JSON output for RAG pipelines, fine-tuning jobs, and enterprise AI workflows.

The open source library became the fastest proof of traction. By spring 2023 - before the Series A had even closed - Unstructured had logged over 800,000 downloads and appeared in more than 2,500 GitHub projects. Developers building RAG applications, LLM fine-tuning pipelines, and document parsing tools had quietly adopted it because there was nothing else that worked as reliably across document types.

Raymond is deliberately unbothered by the unglamorous positioning. "It played to our benefit that we're unsexy," he said in an interview - a line that sums up his operating philosophy. In a market crowded with model providers and AI chatbots chasing the same spotlight, being the company that reliably processes PDFs and scanned tables is a competitive moat disguised as a limitation.


From Langley to LangChain

Raymond's path to AI infrastructure ran through some unusual places. He joined the Central Intelligence Agency in 2009, spending five years as an intelligence officer - drafting presidential briefings, training at the CIA's Sherman Kent School for intelligence analysis, and serving as a daily briefer for senior officials at the White House and State Department. The work was about turning chaotic information - signals, reports, fragmentary intelligence - into coherent, actionable assessments. Sound familiar.

In June 2014, he moved across the Potomac to the White House, serving on the National Security Council as Director for Iraq. For over a year, he sat in rooms where President Obama and Vice President Biden were deciding US policy toward ISIS and the broader Middle East. He synthesized classified and open-source intelligence and translated it for decision-makers who needed clarity, not raw data.

The pattern - turning raw, messy information into something useful - would follow him out of government. After an MBA at Tuck School of Business at Dartmouth, Raymond pivoted to investment banking at Harris Williams & Co., then to AI at Primer, where the same data-to-insight challenge appeared at enterprise scale.

"I would not start with technology in search of a problem. I would search for a problem in need of technology."
- Brian Raymond, on founding advice

When he co-founded Unstructured with Robinson and Wolfe, the team had an unusual combination: intelligence community operational instincts, deep NLP research (Robinson's PhD), and production infrastructure experience (Wolfe's architecture background). The founding team had already watched each other work under pressure on high-stakes AI projects. They started building knowing the problem was real because they had personally hit the wall.


On AI Hype and the Production Gap

Raymond is measured about AI's current moment in a way that's rare among infrastructure CEOs who would normally benefit from amplifying the hype. In interviews, he's named the dissonance directly: two contradictory narratives are running simultaneously - one that AGI is imminent, and another that all the current AI progress is an expensive flop. He doesn't fully accept either.

What he does accept is the difficulty of production. "It's still very hard," he told one podcast host. "The instances in which these workflows or applications are making it from prototype to production - there's a lot of pattern matching." This isn't pessimism; it's the diagnosis that created his company's market. The gap between AI demos and AI production is exactly where Unstructured lives.

On the debate between RAG and ever-expanding context windows, Raymond has staked a clear position: "RAG will continue to be the dominant paradigm." His reasoning is practical - enterprise users working in integrated copilot environments want embedded workflows, not document upload interfaces. The context window may grow, but the need for clean, preprocessed, structured data doesn't disappear. If anything, it grows with it.

"RAG will continue to be the dominant paradigm."
ON AI ARCHITECTURE
"LLMs still perform terribly at parsing tables without demarcating lines."
ON DOCUMENT PROCESSING
"Our vision is to connect human generated data with foundation models."
ON COMPANY MISSION
"Unstructured must exist and win for AI to reach its full potential."
ON MARKET POSITION

The $65M Infrastructure Stack

The Series B roster tells its own story. Menlo Ventures led. Databricks Ventures joined, underscoring the data lakehouse giant's interest in the preprocessing layer that feeds its platform. IBM Ventures wrote a check - relevant given IBM's enterprise customer base and its own AI push. NVIDIA's investment arm NVentures participated, signaling that GPU compute companies understand the data pipeline has to work for the hardware to get used at scale.

Alongside the institutional investors came notable angels: Sacramento Kings Chairman Vivek Ranadivé, Datastax CEO Chet Kapoor, and investor Allison Pickens. Tim Tully of Menlo Ventures joined the board. The earlier Series A investor roster had included Harrison Chase, founder of LangChain, and Bob van Luijt, CEO of Weaviate - both direct ecosystem partners whose users depend on Unstructured's output.

At 120 employees and with government sector clients processing between 500,000 and 2 million documents daily, Unstructured is operating at scale that most AI startups are still drawing on whiteboards. Raymond's intelligence community background - where ingesting and synthesizing enormous document volumes was an operational requirement, not an aspirational goal - may be the most quietly valuable credential in AI infrastructure today.

From Briefing the President to Pitching VCs

2009
Central Intelligence Agency

Intelligence Officer for five years. Drafted assessments for the President of the United States. Served as a daily intelligence briefer for the White House and State Department. Trained at the CIA's Sherman Kent School for intelligence analysis.

2014
White House - National Security Council

Director for Iraq. Advised President Obama and Vice President Biden on Iraq, ISIS, and Middle East policy. Synthesized classified and open-source intelligence for senior decision-makers.

2015
Tuck School of Business, Dartmouth

MBA. Pivoted from government to business, laying the groundwork for a private sector career in AI and finance.

2016
Harris Williams & Co. - Investment Banking

Two years in financial advisory and investment banking. Built the commercial and financial fluency that would later inform his approach to raising and deploying venture capital.

2018
Primer AI - VP Global Public Sector

Led the national security group building AI solutions for US government agencies and Fortune 100 companies. Identified the core data ingestion bottleneck that would become Unstructured's founding insight.

2022
Founded Unstructured

Co-founded with Matt Robinson and Crag Wolfe. Raised seed funding from Bain Capital Ventures. Built and launched open-source ETL library for LLM data preprocessing.

2023
Series A - $25M, 800K+ Downloads

Led by Madrona. Ecosystem investors included LangChain's Harrison Chase and Weaviate's Bob van Luijt. Open source library hit 800,000+ downloads and 2,500+ GitHub integrations before the round closed.

2024
Series B - $40M from NVIDIA, Databricks, IBM

Led by Menlo Ventures. Total funding reaches $65M. Named 2024 IA40 winner in AI infrastructure. Spoke at Databricks Data + AI Summit. Team grows to 120 people.

$65M in Under Two Years

Seed Round
$5M
2022
Led by: Bain Capital Ventures
Initial bet on the data preprocessing problem before enterprise AI became a buzzword.
Series A
$25M
July 2023
Led by: Madrona
Also: M12, Bain Capital Ventures, Mango Capital, MongoDB Ventures, Shield Capital
Angels: Harrison Chase (LangChain), Bob van Luijt (Weaviate)
Series B
$40M
March 2024
Led by: Menlo Ventures
Also: Databricks Ventures, IBM Ventures, NVIDIA NVentures
Board: Tim Tully (Menlo) joins
ALL INVESTORS
Menlo Ventures NVIDIA NVentures Databricks Ventures IBM Ventures Madrona Bain Capital Ventures M12 (Microsoft) Mango Capital MongoDB Ventures Shield Capital Vivek Ranadivé Chet Kapoor Harrison Chase Bob van Luijt

The Specifics

25+

File types Unstructured can process - PDFs, PowerPoints, images, audio, HTML, and more. If enterprise data lives in it, Unstructured has probably built a parser for it.

CIA

Two of Unstructured's three co-founders have intelligence community backgrounds. The skill set of turning raw data into actionable structured output is, it turns out, directly transferable to LLM data pipelines.

2M

Documents per day. Government sector clients are running Unstructured at that scale - processing volumes that would have been national security infrastructure not long ago.

2,500

GitHub projects were already using Unstructured before the Series A closed in 2023. The open source community adopted the tool on its merits before Raymond needed to sell it.

WH

Brian Raymond briefed senior officials at the White House and State Department daily as a CIA intelligence officer. Few AI infrastructure founders have had a morning meeting that high-stakes.

Kings

Sacramento Kings Chairman Vivek Ranadivé joined the Series B as an angel investor. The NBA owner and tech entrepreneur knows enterprise software - he co-founded TIBCO before founding the Sacramento Kings ownership group.