Fireworks AI

Origin Story

When PyTorch engineers get bored

In 2022, Lin Qiao was Head of PyTorch at Meta - which is roughly equivalent to being the caretaker of the most important piece of AI infrastructure in the world. PyTorch powers the research of nearly every major AI lab, runs inside Meta's own products, and underpins most of the AI papers published in academic journals. A comfortable perch, to say the least.

She left anyway. And she brought six colleagues with her.

The problem she kept running into wasn't a shortage of clever AI models. It was everything else. Companies across every industry wanted to deploy generative AI but had nowhere to start. No GPU infrastructure. No inference stack. No team with the depth to build it. The models existed. The capability to run them efficiently - at scale, at low cost, with production-grade reliability - was gated behind Meta, Google, and a handful of hyperscalers.

Fireworks AI's founding thesis was simple: democratize the infrastructure. Build the inference layer that every company needs but almost no company can build themselves. The name is a direct nod to PyTorch's flame logo - spreading the same fire that lit up Meta's AI stack across the rest of the industry.

The Founder on the Mission

"Companies wanted to prioritize AI but lacked the infrastructure, resources, and talent. We wanted to spread that PyTorch flame across industries."

- Lin Qiao, Co-Founder & CEO, Fireworks AI

The Team

Seven engineers who built the internet's AI backbone

The founding team is a collector's item. Five of the seven co-founders worked directly on PyTorch at Meta - the framework that runs inside nearly every serious AI research project and production system. The other two come from Meta Ads infrastructure and Google Vertex AI. This isn't a team that read about AI infrastructure. They built it.

Lin Qiao

Co-Founder & CEO

Benny Chen

Co-Founder

Chenyu Zhao

Co-Founder

Dmytro Dzhulgakov

Co-Founder

Dmytro Ivchenko

Co-Founder

James Reed

Co-Founder

Pawel Garbacki

Co-Founder

Lin Qiao holds a PhD in Computer Science from UC Santa Barbara focused on distributed systems and machine learning. Before PyTorch, she worked at Facebook, LinkedIn, and IBM. Her co-founders bring comparably deep credentials: PyTorch core maintenance, Newsfeed ML systems, Google Vertex AI, and Meta Ads infrastructure at scale.

Fun fact: 5 of 7 founders built the framework that almost every AI model in production runs on. Then they left to build the layer underneath it.

What They Built

The full stack for AI in production

Fireworks AI is not a model company. It doesn't train foundation models or try to compete with OpenAI or Anthropic on raw capability. Instead, it solves a different problem: how do you take an open-source model and run it in production, fast, cheaply, and reliably, without hiring a team of GPU engineers?

The platform covers the full lifecycle - from model selection to fine-tuning to deployment to real-time inference at scale. The OpenAI-compatible API means developers can swap in Fireworks without rewriting their code. The catalog includes hundreds of models across text, image, audio, and multimodal workloads.

Serverless Inference

Run hundreds of open-source models with no GPU setup, no cold starts, pay-per-token pricing. Up to 300+ tokens/second throughput on popular models.

FireAttention

Proprietary CUDA kernel series. V1 was 4x faster than vLLM at launch. V4 on NVIDIA Blackwell B200 GPUs delivers 3.5x throughput vs. SGLang on H200 using FP4 precision.

FireOptimizer

Adaptive speculative execution engine that reduces inference latency by up to 3x in production workloads. Cursor used it to halve their generation latency.

Fine-Tuning

Supervised and reinforcement fine-tuning for models up to 1T+ parameters via LoRA-based methods. 2x more cost-efficient than competitors. Deploy up to 100 fine-tuned models at no additional cost.

Enterprise Platform

HIPAA, GDPR, SOC 2 compliant. 99.99% API uptime SLA. Dedicated infrastructure options for companies that need full isolation.

OpenAI-Compatible API

Drop-in replacement for the OpenAI API. Switch models, cut costs, gain speed - without touching your application code.

On the AI Landscape

"The next wave of quality is not going to be one of 'single model solves all problems.' The future will involve hundreds of small expert models solving narrower sets of problems."

- Lin Qiao, Co-Founder & CEO

Who Runs on Fireworks

Samsung · Uber · Shopify · DoorDash · Notion · Cursor · Perplexity · Upwork · Sourcegraph

The Numbers

A funding history that keeps accelerating

Fireworks AI raised $25M in a Series A in early 2024. Seven months later, Sequoia led a $52M Series B at roughly a $552M valuation. Fifteen months after that, the company closed a $250M Series C at a $4B valuation - a 7x jump in a year and a half. The investor roster reads like a who's who of Silicon Valley's most plugged-in technologists.

Round	Amount	Date	Key Investors
Seed	Undisclosed	2022-2023	-
Series A	$25M	Jan 2024	-
Series B	$52M	Jul 2024	Sequoia, NVIDIA, AMD, Databricks, Benchmark, Sheryl Sandberg, Frank Slootman
Series C	$250M	Oct 2025	Lightspeed, Index Ventures, Sequoia, NVIDIA, AMD, MongoDB, Databricks

Strategic investors include NVIDIA and AMD - both chipmakers betting that Fireworks will be the layer that puts their hardware to work for the most demanding AI workloads. Sheryl Sandberg (former Meta COO) and Frank Slootman (former Snowflake CEO) are among the personal angels backing the company.

Milestones

What three years can look like

🔥 $4B valuation reached in under 3 years of founding

📈 $315M annualized revenue run-rate - up 416% year-over-year as of Feb 2026

⚡ 10 trillion+ tokens processed per day as of Oct 2025

🚀 Grew from 1,000 to 10,000+ customers between Series B and C - in about 15 months

⚙️ FireAttention V1 was 4x faster than vLLM at launch with no quality tradeoff

🎯 Cursor achieved 2x reduction in generation latency using Fireworks speculative decoding

🌐 Acquired Hathora - real-time container orchestration across 14 global regions

🏆 #14 on Enterprise Tech 30 (Mid Stage) ranking

Partners & Ecosystem

Running on every major cloud

Fireworks AI operates across NVIDIA, AMD, AWS, Google Cloud, and Oracle Cloud GPU infrastructure. In 2025, the company signed a Strategic Collaboration Agreement with AWS and achieved AWS Generative AI Competency Partner status. A multi-year deal with Microsoft Azure brought Fireworks into Azure Foundry as a native option.

On the model side, partnerships with Meta (Llama), Mistral AI, and Stability AI ensure the catalog stays current with the latest open-source releases. Databricks and MongoDB are both strategic investors and integration partners - Fireworks handles inference for ML workflows built on Databricks and RAG pipelines built on MongoDB.

AWS Microsoft Azure NVIDIA AMD Google Cloud Oracle Cloud Databricks MongoDB Meta (Llama) Mistral AI

The Bigger Picture

Why one model will never be enough

The popular narrative in AI has been about chasing the biggest, smartest foundation model. GPT-4, then GPT-5. Claude Sonnet, then Claude Opus. Each release touted as the one that finally cracks general intelligence.

Fireworks AI is betting on a different future. Lin Qiao has argued publicly that the next phase of AI quality won't come from making models bigger - it'll come from making them more specialized. Hundreds of small, expert models trained on curated domain-specific data, each solving a narrow problem better than any general-purpose model can.

That future is convenient for a platform that already hosts hundreds of models and has built fine-tuning infrastructure capable of creating new ones cheaply. Fireworks isn't just serving the current AI market - it's building toward one where every company has its own model, its own inference stack, and its own AI product. Not a dependency on OpenAI's pricing decisions.

On AI Agents

"I'm very bullish on agents. I think it's going to blossom."

- Lin Qiao, Co-Founder & CEO

Latest Move

The Hathora acquisition

In March 2026, Fireworks AI acquired Hathora Inc., a real-time container orchestration platform spanning 14 regions across two bare-metal providers and four cloud environments. The move was framed as a talent and infrastructure acquisition - Hathora's engineering culture, described by Lin Qiao as focused on "every millisecond and every routing decision," maps directly onto the demands of cutting-edge AI inference.

The practical effect: sub-20ms routing decisions across a global network, applied to inference workloads that demand low latency at any location. For developers running AI products with global user bases, that's the difference between an AI assistant that feels responsive and one that doesn't.

Timeline

From PyTorch to $4 billion

2022

Fireworks AI, Inc. founded in Redwood City, CA by seven engineers from Meta and Google. Seed round closed.

Jan 2024

Series A of ~$25M raised to build out the inference platform.

Jul 2024

$52M Series B led by Sequoia Capital at ~$552M valuation. NVIDIA, AMD, Databricks, and Benchmark among investors.

Oct 2025

$250M Series C at $4B valuation. Co-led by Lightspeed and Index Ventures. Company crosses $280M ARR and 10,000+ customers. FireAttention V4 released for NVIDIA Blackwell B200 GPUs.

2025

Multi-year partnership with Microsoft Azure Foundry. Strategic Collaboration Agreement with AWS. AWS Generative AI Competency Partner status awarded.

Feb 2026

ARR hits $315M annualized run-rate - up 416% year-over-year.

Mar 2026

Acquires Hathora Inc. for real-time global container orchestration across 14 regions.

Things Worth Knowing

Six facts about Fireworks AI

Five of the seven co-founders were core engineers on the PyTorch team at Meta. They didn't just use the most popular AI framework - they wrote it.

The company name is a direct nod to PyTorch's flame logo. The vision: spreading that same flame to every industry that needs it.

Valuation grew 7x in 15 months - from ~$552M at Series B in July 2024 to $4B at Series C in October 2025.

10,000+ customers reached without a large traditional sales force. Growth was largely developer-led and product-led.

Sheryl Sandberg (ex-Meta COO) and Frank Slootman (ex-Snowflake CEO) are personal angel investors - alongside Alexandr Wang of Scale AI and Howie Liu of Airtable.

The team is ~166 people. Processing 10 trillion tokens a day. That's roughly 60 billion tokens per employee per day.

Latest Updates

What's been happening

MAR
2026

Acquired Hathora Inc. - a real-time container orchestration platform across 14 global regions - to enhance inference routing speed and reduce latency worldwide.

FEB
2026

Annualized revenue run-rate reaches $315M, up 416% year-over-year. One of the fastest-growing infrastructure companies in AI.

OCT
2025

$250M Series C closed at $4B valuation. Crossed $280M ARR and 10,000+ customer milestones simultaneously. FireAttention V4 released for NVIDIA B200 GPUs.

2025

Multi-year partnership with Microsoft Azure Foundry announced. Strategic Collaboration Agreement with AWS signed. AWS Generative AI Competency Partner status achieved.

AI Inference LLM Open-Source GPU Generative AI API Fine-Tuning Enterprise PyTorch ML Infrastructure Cloud

Find Fireworks AI

🌐 Website 𝕏 Twitter / X 💼 LinkedIn 🐙 GitHub 📄 Docs 📝 Blog