Profile
The Spy Who Solved AI's Dirtiest Problem
The "first mile" of enterprise AI is unglamorous, underestimated, and worth everything.
Before the PowerPoint slides and the term sheets, there was a problem. Not the kind that shows up in TED talks - the kind that shows up at 11pm when a data engineer at a Fortune 100 company is staring at a wall of unreadable PDFs and wondering why the AI demo stopped working in production.
Brian Raymond spotted that problem from an unusual perch. At Primer AI, where he led the national security group building AI tools for US government clients, he watched the same bottleneck appear again and again: the models were fine, the compute was available, but the raw files - PowerPoints, scanned contracts, audio recordings, Word docs - sat there stubbornly, refusing to become the clean JSON that language models could actually use.
In 2022, he left to fix it. With two former Primer colleagues - Matt Robinson, a PhD data scientist with an intelligence community background, and Crag Wolfe, a seasoned infrastructure architect - Raymond founded Unstructured. The company builds what its name says: the tooling to turn unstructured data into LLM-ready formats.
"If you go to a large investment bank or Walmart, the data exists in PowerPoints, Google docs, Slack messages, and audio recordings, with nothing to help data scientists get that into JSON to feed to models."
- Brian Raymond, Founder & CEO, Unstructured
The insight was straightforward enough to articulate but hard enough to solve that no one had quite done it. Most AI infrastructure conversation in 2022 centered on models - bigger, faster, cheaper. Raymond's bet was that none of it would reach production without a working pipeline from raw enterprise data to the model's input. He called it the "first mile of AI."
Two years in, the bet is looking sharp. Unstructured raised $65 million across three rounds - a $5 million seed from Bain Capital Ventures, a $25 million combined seed and Series A led by Madrona in July 2023, and a $40 million Series B closed in March 2024 with Menlo Ventures leading and Databricks Ventures, IBM Ventures, and NVIDIA's NVentures all participating.
Unstructured supports 25+ file types - PDFs, PowerPoints, images, audio, scanned documents - and delivers standardized JSON output for RAG pipelines, fine-tuning jobs, and enterprise AI workflows.
The open source library became the fastest proof of traction. By spring 2023 - before the Series A had even closed - Unstructured had logged over 800,000 downloads and appeared in more than 2,500 GitHub projects. Developers building RAG applications, LLM fine-tuning pipelines, and document parsing tools had quietly adopted it because there was nothing else that worked as reliably across document types.
Raymond is deliberately unbothered by the unglamorous positioning. "It played to our benefit that we're unsexy," he said in an interview - a line that sums up his operating philosophy. In a market crowded with model providers and AI chatbots chasing the same spotlight, being the company that reliably processes PDFs and scanned tables is a competitive moat disguised as a limitation.
From Langley to LangChain
Raymond's path to AI infrastructure ran through some unusual places. He joined the Central Intelligence Agency in 2009, spending five years as an intelligence officer - drafting presidential briefings, training at the CIA's Sherman Kent School for intelligence analysis, and serving as a daily briefer for senior officials at the White House and State Department. The work was about turning chaotic information - signals, reports, fragmentary intelligence - into coherent, actionable assessments. Sound familiar.
In June 2014, he moved across the Potomac to the White House, serving on the National Security Council as Director for Iraq. For over a year, he sat in rooms where President Obama and Vice President Biden were deciding US policy toward ISIS and the broader Middle East. He synthesized classified and open-source intelligence and translated it for decision-makers who needed clarity, not raw data.
The pattern - turning raw, messy information into something useful - would follow him out of government. After an MBA at Tuck School of Business at Dartmouth, Raymond pivoted to investment banking at Harris Williams & Co., then to AI at Primer, where the same data-to-insight challenge appeared at enterprise scale.
"I would not start with technology in search of a problem. I would search for a problem in need of technology."
- Brian Raymond, on founding advice
When he co-founded Unstructured with Robinson and Wolfe, the team had an unusual combination: intelligence community operational instincts, deep NLP research (Robinson's PhD), and production infrastructure experience (Wolfe's architecture background). The founding team had already watched each other work under pressure on high-stakes AI projects. They started building knowing the problem was real because they had personally hit the wall.
On AI Hype and the Production Gap
Raymond is measured about AI's current moment in a way that's rare among infrastructure CEOs who would normally benefit from amplifying the hype. In interviews, he's named the dissonance directly: two contradictory narratives are running simultaneously - one that AGI is imminent, and another that all the current AI progress is an expensive flop. He doesn't fully accept either.
What he does accept is the difficulty of production. "It's still very hard," he told one podcast host. "The instances in which these workflows or applications are making it from prototype to production - there's a lot of pattern matching." This isn't pessimism; it's the diagnosis that created his company's market. The gap between AI demos and AI production is exactly where Unstructured lives.
On the debate between RAG and ever-expanding context windows, Raymond has staked a clear position: "RAG will continue to be the dominant paradigm." His reasoning is practical - enterprise users working in integrated copilot environments want embedded workflows, not document upload interfaces. The context window may grow, but the need for clean, preprocessed, structured data doesn't disappear. If anything, it grows with it.
"RAG will continue to be the dominant paradigm."
ON AI ARCHITECTURE
"LLMs still perform terribly at parsing tables without demarcating lines."
ON DOCUMENT PROCESSING
"Our vision is to connect human generated data with foundation models."
ON COMPANY MISSION
"Unstructured must exist and win for AI to reach its full potential."
ON MARKET POSITION
The $65M Infrastructure Stack
The Series B roster tells its own story. Menlo Ventures led. Databricks Ventures joined, underscoring the data lakehouse giant's interest in the preprocessing layer that feeds its platform. IBM Ventures wrote a check - relevant given IBM's enterprise customer base and its own AI push. NVIDIA's investment arm NVentures participated, signaling that GPU compute companies understand the data pipeline has to work for the hardware to get used at scale.
Alongside the institutional investors came notable angels: Sacramento Kings Chairman Vivek Ranadivé, Datastax CEO Chet Kapoor, and investor Allison Pickens. Tim Tully of Menlo Ventures joined the board. The earlier Series A investor roster had included Harrison Chase, founder of LangChain, and Bob van Luijt, CEO of Weaviate - both direct ecosystem partners whose users depend on Unstructured's output.
At 120 employees and with government sector clients processing between 500,000 and 2 million documents daily, Unstructured is operating at scale that most AI startups are still drawing on whiteboards. Raymond's intelligence community background - where ingesting and synthesizing enormous document volumes was an operational requirement, not an aspirational goal - may be the most quietly valuable credential in AI infrastructure today.