Breaking
50 billion new facts added to Diffbot Knowledge Graph - Jan 2026 Diffy LLM grounds answers in a trillion-fact graph 10+ billion entities indexed and counting First open-source GraphRAG implementation, January 2025 Cisco · DuckDuckGo · Snapchat · Adobe · Salesforce Founded 2008. Still crawling. 50 billion new facts added to Diffbot Knowledge Graph - Jan 2026 Diffy LLM grounds answers in a trillion-fact graph 10+ billion entities indexed and counting First open-source GraphRAG implementation, January 2025 Cisco · DuckDuckGo · Snapchat · Adobe · Salesforce Founded 2008. Still crawling.
Dispatch · Menlo Park, CA · Vol. XVIII

Diff·bot

A 31-person company in a Menlo Park office is quietly assembling the largest fact-checked database of the public web. Most people have used it without knowing.

Founded 2008 ~31 employees $13M raised Trillion+ facts
Diffbot logo featuring the mascot Diffy
Diffy, the resident mascot - reading the web so you don't have to.
The Scene

It is a Tuesday morning in 2026, and somewhere in Menlo Park a fleet of crawlers is reading the internet. Not skimming it. Reading it - rendering each page in a virtual browser, looking at it the way a human would, and writing down what it sees. Article. Author. Date. Product. Price. Person. Title. Employer. A patient act of comprehension, repeated billions of times before lunch.

The robots do not get bored. They have been doing this since 2008, before the iPhone had an App Store, before "AI" was a marketing department. They feed a database now measured in trillions.

The company running them is called Diffbot. It has 31 employees.

"Diffbot's AI model doesn't guess - it knows, thanks to a trillion-fact knowledge graph." — VentureBeat, 2025

Most AI companies make headlines by generating things. Diffbot's bet has always been quieter and stranger: that the more interesting problem is reading. That structure is the bottleneck. That if you can turn the web into a database, every downstream question - search, sales, research, grounded generation - gets easier.

Eighteen years later, that bet is paying off in a way the rest of the industry is now scrambling to imitate.

1T+
Facts in graph
across 10B+ entities
2008
Year founded
older than the iPad
31
Employees
vs. most of the web
$13M
Total raised
across Seed + Series A
The Idea

What if the web came pre-structured?

The pitch is almost embarrassingly simple. The web is the largest and richest dataset humanity has ever made. It is also a mess - text wrapped in markup wrapped in ads wrapped in JavaScript. To use it, you scrape. To trust it, you cross-check. To scale it, you give up.

Diffbot's answer is to treat web pages the way a person does: visually. Its extractors render a page in a real browser, look at the pixels, and decide what the page is about. Article? Product? Forum thread? A profile of a person? Then the relevant fields - byline, price, ratings, author - come out as structured JSON. No fragile CSS selectors. No site-specific scripts.

Run that across the entire public web, continuously, for almost two decades, and you do not have a scraper. You have a knowledge graph.

A trillion facts is not a metaphor

Diffbot's graph now spans more than 10 billion entities - people, organizations, articles, products, places - linked by over a trillion facts. In January 2026 alone the company added 50 billion new facts, 30 million new organizations, and 600 million new articles. Most companies celebrate quarterly product launches. Diffbot celebrates a slow news month.

The grounded-AI bet

When generative AI arrived, the industry's first instinct was to make models bigger. Diffbot's instinct, predictably, was to make them honest. In January 2025 the company released Diffbot LLM - a fine-tune of Meta's Llama 3.3, plugged directly into the knowledge graph. Ask it a question and it answers with citations to specific facts, retrieved at query time. The company called it the first open-source GraphRAG implementation. You can try it at diffy.chat.

The model is not a chatbot dressed up to look like a research tool. It is a research tool that learned to chat.

Who uses it

Diffbot's customer list reads like the back of a cereal box you didn't know you'd been eating from. DuckDuckGo's instant answers. Snapchat's link previews. AOL. Bing. Adobe. Cisco. eBay. Salesforce. Samsung. CBS Interactive. If you have ever pasted a URL into a chat app and watched a tidy preview appear, the pipework was often Diffbot's.

The newer products - Enhance and LeadGraph - take the same graph and aim it at sales teams. Funding events become searchable. Org charts become reliable. The thing you used to pay three vendors for becomes one query.

The patient company

Diffbot has been profitable. Diffbot has not gone public. Diffbot has not raised a Series B. It raised $10 million in February 2016, on top of an earlier $2 million seed from Matrix Partners and Tencent, and then it went back to work. Eighteen years. Thirty-one people. A trillion facts. In an industry that confuses motion with progress, this counts as a philosophical statement.

Founder Mike Tung - patent lawyer turned Stanford AI grad student turned engineer at eBay, Yahoo, and Microsoft - did not set out to build a unicorn. He set out to build a map of human knowledge. The map turned out to be the unicorn.

By the Numbers

The graph, in monthly intake.

A rough sketch of what Diffbot's crawlers added to the knowledge graph in a recent month. Bars are scaled relative to one another - the point is the order of magnitude, not the decimal.

New entities added per month (relative scale)

Articles
600M
Facts
50B+
Organizations
People
tens of millions
Products
millions
What's on offer

Six products. One graph.

Knowledge Graph

10B+ entities, trillion+ facts. Queryable, refreshable, sourced. The thing under everything else.

Extract APIs

Article, Product, Discussion, Image, Video, Analyze. Point at a URL, get structured JSON back.

Crawlbot

Run extraction across entire sites at scale. The polite, distributed cousin of your homemade scraper.

Natural Language API

Pull entities, relationships, and sentiment out of raw text - same ontology as the graph.

Diffbot LLM (Diffy)

Open-source GraphRAG model on Llama 3.3. Cites the graph. Try it at diffy.chat.

Enhance & LeadGraph

Enrich CRM records, track funding events, find decision-makers - powered by the same graph.

In Production At

The pipework behind names you know.

Cisco DuckDuckGo Snapchat Adobe Salesforce Samsung eBay AOL CBS Interactive Bing Instapaper
The Founder

From patent law to a map of knowledge.

MT
Mike Tung
Founder & CEO · Diffbot · 2008–present

UC Berkeley EECS. Stanford AI Lab. Stints as a patent lawyer and as an engineer at eBay, Yahoo, and Microsoft. Started Diffbot the year smartphones learned to multitask, and has spent the time since trying to teach robots to read.

Recent Dispatches

What Diffbot has been up to.

Jan 2026
Knowledge Graph gains 50 billion new facts, 30M organizations, 600M articles in a single month.
Nov 2025
Extraction APIs add an LLM-ready markdown output format, including interactive elements - built for the generation of agents that read the web.
Jan 2025
Diffbot LLM launches: the first open-source GraphRAG implementation, grounded in the company's knowledge graph.
Feb 2016
Series A: $10M from Tencent, Bloomberg Beta, and Matrix Partners.
2008
Mike Tung founds Diffbot to read the web the way a person would.
Watch & Read

Going deeper.

Pass it on

Share this dispatch.

It is still that Tuesday morning in Menlo Park. The crawlers are still reading. The graph just grew by another billion facts while you were on this page. Somewhere a salesperson opens a CRM and finds a lead enriched, a developer pastes a URL and gets clean JSON back, a chatbot answers a question with a citation instead of a guess. Most of the people on the receiving end will never know the name Diffbot. That is, in a way, the point.

— End of dispatch —