Fangjin Yang

The Story

The engineer who wouldn't wait for a query

In 2011, programmatic advertising was eating the internet - and it was eating it fast. Bidding decisions needed to happen in milliseconds. Analytics needed to keep pace. At a startup called Metamarkets, Fangjin Yang and a small team built what would eventually become Apache Druid because nothing else was fast enough.

That was not a product launch. It was an act of engineering necessity. They were not building a database company. They were building a tool for clients who needed real-time analytics on ad campaigns and could not afford to wait three seconds for a dashboard to refresh. What they built turned out to be useful for everyone else too.

Yang has spent over a decade on the same core problem - how do you give thousands of users simultaneous, interactive, sub-second access to enormous datasets? Most analytics tools give you a batched answer. Yang wanted to give you a live one. He still does.

"When we started Druid, there just weren't that many databases that were really specialized at powering these different forms of data applications, where you could have thousands or tens of thousands of users."

- Fangjin Yang

Background

From Waterloo to the Apache Foundation

Yang trained as an electrical and computer engineer at the University of Waterloo - two degrees, both applied science. He came out of one of Canada's most technically rigorous universities and moved into software engineering, first at Cisco, then at Metamarkets.

Metamarkets is where the story bends. The company was building a user-facing analytics engine for programmatic advertising firms. Clients needed millisecond response times. The existing database options - Hadoop, traditional RDBMS, early columnar stores - were too slow or too rigid. So Yang and his colleagues built their own: a distributed, columnar, in-memory data store with automatic time-partitioning and aggressive indexing.

They named it Druid. Then they open-sourced it - not as a growth hack, but because they had been shaped by open-source software and felt the obligation to give something back. That decision changed the arc of everything. Druid spread to dozens of industries. Users found problems the Metamarkets team had never imagined. A community formed. Eventually, the Apache Software Foundation came calling.

"What you need is a very interactive, almost a 'Google-esque' experience... People want to do that with data as well."

- Fangjin Yang, on what Druid was designed to deliver

Imply

Building the company around the database

In 2015, Yang, Gian Merlino, and Vadim Ogievetsky founded Imply. The thesis was simple: Druid had proven itself in production. Real companies, real scale, real workloads. Now it needed a commercial wrapper - managed infrastructure, visualization, enterprise support, and cloud deployment.

They were backed initially by Khosla Ventures. The product combined a Druid backend with Pivot, a visualization engine that let non-engineers explore data directly. Over time, Imply Enterprise became the on-prem offering, Imply Hybrid bridged the gap, and Imply Polaris became their fully managed database-as-a-service - Druid in the cloud, without the ops burden.

The customer list reads like a shortlist of companies that have already figured out that real-time analytics is not optional. Netflix, Atlassian, Salesforce, Confluent - companies that handle enormous, fast-moving data and cannot afford dashboards that are twelve hours stale.

Imply by the Numbers

2011

Druid first built at Metamarkets

2015

Imply founded

Funding Rounds

180+

Employees

77%

ARR Growth 2023→2024

Thesis

The counterintuitive bet that paid off

Yang's read on what enterprises actually need from a database is different from the popular narrative. Streaming ingestion - the ability to ingest data in real-time from Kafka or Kinesis - gets most of the press coverage. But Yang noticed early that this was not the core value for most of his customers.

"Streaming ingestion is a very small part of the value that our customers actually get from the database. Half of our customers don't even use streaming ingestion."

- Fangjin Yang

What they actually wanted was a database that could serve a large number of concurrent users with sub-second query performance on billions of rows. A product that felt like Google Search applied to your own data. That specific problem - high concurrency, low latency, at scale - is what drove every architectural decision in Druid's design. Column-oriented storage. Automatic bitmap indexing. Aggressive pre-aggregation. Horizontal partitioning by time.

The technical choices were not accidents. They were deliberate answers to a question most database vendors were not asking.

"Those problems at scale are incredibly difficult technical problems that take a group of data engineers a decade in order to do it well."

- Fangjin Yang, on why distributed analytics is still hard

Career Arc

Timeline

~2009

Software Engineer at Cisco - early career foundation in enterprise systems

2011

Lead Engineer at Metamarkets; co-creates Apache Druid to power real-time programmatic advertising analytics

2011-2014

Druid open-sourced; community grows across ad tech, finance, and enterprise software verticals

2015

Co-founds Imply with Gian Merlino and Vadim Ogievetsky; initial backing from Khosla Ventures

2021

Becomes angel investor and a16z scout for Andreessen Horowitz while leading Imply as CEO

2022

Imply raises $100M Series D; valuation crosses $1B; total funding reaches $215.3M

2023

Named Datanami Person to Watch; Imply Polaris cloud service scales enterprise customer base

2024

Imply reports $63M ARR - 77% year-over-year growth; 100 enterprise customers

Record

What he's built

Co-created Apache Druid, now governed by the Apache Software Foundation
Founded Imply in 2015; scaled to $1.1B valuation by 2022
Raised $215.3M in total funding across five rounds
Grew Imply to $63M ARR with 77% year-over-year growth in 2024
Built enterprise relationships with Netflix, Atlassian, Salesforce, and Confluent
Named Datanami Person to Watch in 2023
Active angel investor and a16z scout since 2021
Published technical writing on O'Reilly on analytics stack design and Druid architecture
Speaker at Data Council, O'Reilly, and enterprise data conferences worldwide

Off the Record

The details that don't fit the press release

His Twitter/X bio says "Aspiring rapper." He joined Twitter in June 2009 and has apparently kept the same bio through the entire journey from engineer to unicorn CEO.

Apache Druid was not planned as a product. It was built under deadline pressure at Metamarkets because no existing tool could handle millisecond analytics for programmatic ad bidding.

Yang open-sourced Druid before building a commercial product around it - a rare move driven by philosophical commitment to open source, not a go-to-market strategy.

His GitHub handle is simply "fjy" - just initials, no drama. The commit history goes back over a decade and follows the entire evolution of the Druid codebase.

He holds two engineering degrees from the University of Waterloo - BASc in Electrical Engineering, MASc in Computer Engineering - and now runs a billion-dollar SaaS company's sales, product, and strategy.

Yang became an a16z scout in 2021 while still actively running Imply - investing in other people's ideas while scaling his own. Classic operator move.

Aspiration

Still solving the same hard problem

Yang's driving question has not changed since 2011: why can you search all of human knowledge in 0.3 seconds, but asking your own database a simple analytical question takes minutes? The gap between search and analytics felt like a product failure. Druid and Imply are his answers.

The bet is that every company will eventually run applications where users need to explore data interactively - not just engineers, not just analysts, but thousands of users at once. Customer-facing analytics. Operational dashboards. Embedded insights. That use case requires a fundamentally different database architecture than what most companies run today.

Whether it's the next decade of Imply Polaris scaling cloud deployments or the open source Druid community building new connectors and features, Yang is positioned at the center of the real-time analytics stack. He was there when the problem was invented. He is still building the solution.

"In its early days, Druid was adopted for a set of use cases in a handful of industries. Today, developers have shown its applicability across all industries - and the use cases have expanded exponentially."

- Fangjin Yang on the evolution of Apache Druid

What He Built

The technical foundation

Apache Druid

Column-oriented, distributed, open-source OLAP database. Millisecond queries. Millions of events per second ingestion. Now in the Apache Software Foundation.

Imply Polaris

Fully managed Druid-as-a-service. No infrastructure ops. Automatic scaling. Built for teams that want Druid's performance without the cluster management.

Imply Enterprise

Commercial Druid with advanced visualization, enterprise security, and dedicated support. Used by Netflix, Atlassian, and Confluent.

Pivot / Clarity

Visual analytics interface that sits atop Druid. Makes high-cardinality data exploration possible without writing SQL for every question.

Find Him

Links & Resources

🌐 Imply.io 💼 LinkedIn 🐦 Twitter / X ⌨️ GitHub (fjy) 📚 O'Reilly Articles ▶️ CUBE Interview ▶️ Imply.io CUBE Talk 🎙️ SE Daily Podcast