BREAKING
25M+ monthly downloads 1 Billion documents parsed 48,300+ GitHub stars $19M Series A raised March 2025 90+ Fortune 500 customers 300,000+ LlamaParse users Jerry Liu & Simon Suo — Forbes 30 Under 30 Started as a tweet in November 2022 Salesforce Agentforce runs on LlamaIndex SOC 2 Type 2 certified 25M+ monthly downloads 1 Billion documents parsed 48,300+ GitHub stars $19M Series A raised March 2025 90+ Fortune 500 customers 300,000+ LlamaParse users Jerry Liu & Simon Suo — Forbes 30 Under 30 Started as a tweet in November 2022 Salesforce Agentforce runs on LlamaIndex SOC 2 Type 2 certified
AI INFRASTRUCTURE ENTERPRISE · SAN FRANCISCO · FOUNDED 2022

LlamaIndex

It started as a tweet. It became the backbone of enterprise AI.

On November 8, 2022, Jerry Liu posted a small open-source project called GPT Index on Twitter. No press release. No launch event. No venture capital. Just a GitHub link and a modest description. Within months it had tens of thousands of stars, a Discord server buzzing with developers, and a name change that would prove prescient: LlamaIndex. Today it processes over one billion enterprise documents a year and powers everything from Salesforce Agentforce to KPMG's consulting workflows. The distance between that tweet and this company is not just about timing. It is about building something developers actually needed.

Python & TypeScript SDK Open Source RAG & Agents Enterprise SaaS ~80 Employees SF, CA
48K+
GitHub Stars
25M+
Monthly Downloads
1B+
Documents Parsed
300K+
LlamaParse Users
90+
Fortune 500 Customers
300+
LlamaHub Integrations

A Hackathon Project That Wouldn't Stay Small


The story begins at a hackathon. In October 2022, Jerry Liu - then an AI engineer at Robust Intelligence - had an idea during an internal company event. What if you could give a large language model access to your own documents? The concept was simple enough to build in a weekend. The result was GPT Index, a Python library for connecting LLMs to external data. Liu put it on GitHub. Two weeks later, he tweeted about it.

The response was not instant fame. It was something more durable: steady, compounding interest from developers who had the same problem. Engineers at startups, in big tech, in consulting firms were all running into the same wall - they had powerful language models, but no reliable way to hook those models up to their own data. GPT Index gave them a scaffold to work with.

By early 2023, Liu had quit his job and brought in Simon Suo - another Uber AI alumnus - as co-founder and CTO. The project was renamed LlamaIndex, a nod to Meta's LLaMA model family that was gaining traction in the open-source community. Within weeks of the rename, LlamaIndex trended as the number one AI repository on all of GitHub.

On Origins LlamaIndex started as a side project at an internal Robust Intelligence hackathon in October 2022 - then became a tweet - then became a company.

- From the LlamaIndex origin story (paraphrased from public sources)

The Problem: Enterprises Drown in Documents


Every large enterprise has a document problem. Contracts in PDFs. Invoices in spreadsheets with merged cells. Handwritten notes. Multi-page research reports. Slide decks with embedded charts. Scanned forms from the 1990s. These documents contain valuable information that AI models could theoretically use - but only if someone could extract, parse, and index that information reliably first.

That is the gap LlamaIndex fills. The core open-source framework (available in Python and TypeScript) provides modular building blocks for connecting LLMs to external data. Developers use it to build RAG (retrieval-augmented generation) pipelines, which pull relevant document chunks at query time and feed them to a language model. LlamaHub, the ecosystem's integration library, lists over 300 connectors to data sources, vector stores, and LLM providers.

On top of the open-source layer, LlamaIndex has built a commercial stack. LlamaParse is the jewel of the product line: an agentic document parsing engine that handles 90+ file types and achieves 90-95% straight-through processing rates. Traditional enterprise OCR solutions manage 60-70% at best - meaning a significant percentage of documents still require manual review. LlamaParse's accuracy gap is not incidental. It is the product's primary selling proposition.

LlamaCloud, the SaaS and VPC platform, bundles parsing, indexing, and retrieval into an end-to-end workflow with enterprise controls: RBAC, SSO, data residency options, and SOC 2 Type 2 compliance. For companies in regulated industries - finance, healthcare, legal - those last two items are non-negotiable. Getting certified is tedious. LlamaIndex got certified anyway, in December 2024.

Two Uber AI Engineers Who Found a Better Problem


Jerry Liu (CEO) and Simon Suo (CTO) did not meet at a startup incubator or a conference. They overlapped at Uber's AI research division, which is where a lot of the quiet groundwork for LlamaIndex was probably laid - both in technical intuition and in understanding what production-grade AI systems actually require. Uber's AI infrastructure is notoriously rigorous. Engineers there learn that elegant prototypes and reliable production systems are very different things.

Liu has described the early days of LlamaIndex as moving fast and learning in public. The company was transparent on Discord, responsive on GitHub, and consistent about shipping. That culture - open-source first, developer-centric - still defines how LlamaIndex operates. In 2024, both Liu and Suo were named to the Forbes 30 Under 30 list in the Enterprise Technology category. They had built a real company by then, not just a popular repository.

The six-month mark after launching as a company was a useful reality check: 16,000 GitHub stars, 20,000 Twitter followers, 200,000 monthly downloads, 6,000 Discord members. None of those numbers required a PR firm. They came from developers solving real problems and telling other developers.

From Side Project to Enterprise Platform


Oct 2022
GPT Index built at an internal Robust Intelligence hackathon by Jerry Liu.
Nov 8, 2022
Liu tweets the project. Initial traction begins organically on GitHub.
Early 2023
Renamed to LlamaIndex. Trends #1 on GitHub AI repositories. Simon Suo joins as co-founder and CTO.
Jun 2023
$8.5M seed round led by Greylock Partners. Notable angels include Jack Altman and Lenny Rachitsky.
Nov 2024
Microsoft Azure integration announced at Microsoft Ignite. 50% reduction in report generation time for joint customers.
Dec 2024
SOC 2 Type 2 certification achieved. LlamaReport launched.
Mar 2025
$19M Series A led by Norwest Venture Partners. LlamaParse and LlamaCloud reach general availability. EU GDPR-compliant SaaS launched.
May 2025
Strategic investments from Databricks Ventures and KPMG LLP. LlamaParse v2 launched at up to 50% lower cost.
Jan 2026
LiteParse open-sourced - a lightweight local document parser. Agent Client Protocol integration launched.
Mar 2026
Gemini Embedding 2 + LlamaParse integration released for searchable audio knowledge bases.

Who Uses It and Why


LlamaIndex's customer list reads like a cross-section of serious enterprise computing. Rakuten uses it. The Carlyle Group - one of the world's largest private equity firms - uses it. KPMG, which also made a strategic investment, uses it across consulting engagements. Salesforce built Agentforce on top of LlamaIndex's async workflow abstractions. Over 90 Fortune 500 companies have adopted LlamaCloud, according to the company.

The common thread is unstructured data at scale. These organizations process tens of thousands of documents monthly. Legal contracts. Financial filings. Client reports. When accuracy drops even a few percentage points, the cost shows up in manual review hours, compliance risks, and delayed decisions. LlamaParse's parsing accuracy, pitched at 90-95% straight-through processing, makes a commercial case that is easy to calculate.

The developer community provides the pipeline. With 300,000 registered LlamaParse users and 25 million monthly package downloads, LlamaIndex has the kind of bottom-up adoption that enterprise sales teams struggle to manufacture. Developers try the open-source framework, build something that works, and then bring it upstairs. The paid products follow the same data path that the free ones already proved out.

The Stack It Plugs Into


Microsoft Azure
Deep integration with Azure OpenAI Service and Azure AI Search. Announced at Microsoft Ignite 2024.
Databricks
Strategic investment. LlamaIndex addresses production-ready LLM needs on the Databricks platform.
Salesforce
LlamaIndex open-source powers Agentforce's async workflow abstractions for concurrent agents.
KPMG
Strategic investor and enterprise customer. Using LlamaIndex across industry-specific AI solutions.
AWS
S3VectorStore integration for enterprise-grade agent workflows on AWS infrastructure.
NVIDIA
Joint case studies including a sales assistant use case (October 2024).

Everything in the Llama Family

LlamaIndex has expanded from a single framework into a product family covering the full lifecycle of enterprise document AI. Each product targets a specific bottleneck in the pipeline.

LlamaParse
ENTERPRISE
Agentic document parsing for 90+ file types. PDFs, spreadsheets, handwritten notes, multi-page layouts. 90-95% straight-through processing vs 60-70% for legacy OCR. 1 billion documents and counting.
LlamaCloud
SAAS / VPC
End-to-end enterprise RAG and agent workflows. Data connectors, parsing, indexing, retrieval. RBAC, SSO, SOC 2 Type 2, EU GDPR compliant. Also available as a VPC deployment.
LlamaAgents
NEW
One-click document agent deployment with ready-to-use templates for invoice processing, contract review, and claims handling. Skip the boilerplate.
LlamaSheets
Transforms broken spreadsheets with merged cells into AI-ready Parquet files using 40+ cell-level features. For anyone who has ever cursed at Excel.
LlamaSplit
Intelligently separates bundled documents into distinct sections. Useful when every incoming PDF is a different animal.
LlamaReport
Transforms document databases into structured, human-readable reports. Launched December 2024.
LlamaTrace
Observability for LlamaIndex apps, co-developed with Arize AI. Because AI pipelines need monitoring too.
LiteParse
OPEN SOURCE
Lightweight local document parser, open-sourced January 2026. Built from LlamaParse learnings, designed to run on your own hardware.

The Capital Stack

Seed Round · June 2023
$8.5M
Led by Greylock Partners
Greylock Jack Altman Lenny Rachitsky Mathilde Collin
Series A · March 2025
$19M
Led by Norwest Venture Partners
Norwest VP Greylock
Strategic · May 2025
Undisclosed
Strategic investors with deep enterprise distribution
Databricks Ventures KPMG LLP
Track Record

Milestones Worth Noting

Trended #1 on GitHub AI repositories in March 2023, within months of the rename to LlamaIndex.
Forbes 30 Under 30 (2024) - Enterprise Technology - for both Jerry Liu and Simon Suo.
SOC 2 Type 2 certified December 2024. EU GDPR-compliant SaaS launched March 2025.
Over 1 billion documents processed through LlamaParse as of 2025.
300+ integration packages in the LlamaHub ecosystem - covering vector stores, LLM providers, and data connectors.
10,000+ organizations using the platform, including 90+ Fortune 500 companies.
48,300+ GitHub stars and 7,100+ forks on the main repository as of 2025.
Salesforce Agentforce - one of the year's most talked-about enterprise AI products - runs on LlamaIndex's async workflow engine.