The engineer who wouldn't wait for a query
In 2011, programmatic advertising was eating the internet - and it was eating it fast. Bidding decisions needed to happen in milliseconds. Analytics needed to keep pace. At a startup called Metamarkets, Fangjin Yang and a small team built what would eventually become Apache Druid because nothing else was fast enough.
That was not a product launch. It was an act of engineering necessity. They were not building a database company. They were building a tool for clients who needed real-time analytics on ad campaigns and could not afford to wait three seconds for a dashboard to refresh. What they built turned out to be useful for everyone else too.
Yang has spent over a decade on the same core problem - how do you give thousands of users simultaneous, interactive, sub-second access to enormous datasets? Most analytics tools give you a batched answer. Yang wanted to give you a live one. He still does.
"When we started Druid, there just weren't that many databases that were really specialized at powering these different forms of data applications, where you could have thousands or tens of thousands of users."- Fangjin Yang
From Waterloo to the Apache Foundation
Yang trained as an electrical and computer engineer at the University of Waterloo - two degrees, both applied science. He came out of one of Canada's most technically rigorous universities and moved into software engineering, first at Cisco, then at Metamarkets.
Metamarkets is where the story bends. The company was building a user-facing analytics engine for programmatic advertising firms. Clients needed millisecond response times. The existing database options - Hadoop, traditional RDBMS, early columnar stores - were too slow or too rigid. So Yang and his colleagues built their own: a distributed, columnar, in-memory data store with automatic time-partitioning and aggressive indexing.
They named it Druid. Then they open-sourced it - not as a growth hack, but because they had been shaped by open-source software and felt the obligation to give something back. That decision changed the arc of everything. Druid spread to dozens of industries. Users found problems the Metamarkets team had never imagined. A community formed. Eventually, the Apache Software Foundation came calling.
"What you need is a very interactive, almost a 'Google-esque' experience... People want to do that with data as well."
- Fangjin Yang, on what Druid was designed to deliverBuilding the company around the database
In 2015, Yang, Gian Merlino, and Vadim Ogievetsky founded Imply. The thesis was simple: Druid had proven itself in production. Real companies, real scale, real workloads. Now it needed a commercial wrapper - managed infrastructure, visualization, enterprise support, and cloud deployment.
They were backed initially by Khosla Ventures. The product combined a Druid backend with Pivot, a visualization engine that let non-engineers explore data directly. Over time, Imply Enterprise became the on-prem offering, Imply Hybrid bridged the gap, and Imply Polaris became their fully managed database-as-a-service - Druid in the cloud, without the ops burden.
The customer list reads like a shortlist of companies that have already figured out that real-time analytics is not optional. Netflix, Atlassian, Salesforce, Confluent - companies that handle enormous, fast-moving data and cannot afford dashboards that are twelve hours stale.
The counterintuitive bet that paid off
Yang's read on what enterprises actually need from a database is different from the popular narrative. Streaming ingestion - the ability to ingest data in real-time from Kafka or Kinesis - gets most of the press coverage. But Yang noticed early that this was not the core value for most of his customers.
"Streaming ingestion is a very small part of the value that our customers actually get from the database. Half of our customers don't even use streaming ingestion."- Fangjin Yang
What they actually wanted was a database that could serve a large number of concurrent users with sub-second query performance on billions of rows. A product that felt like Google Search applied to your own data. That specific problem - high concurrency, low latency, at scale - is what drove every architectural decision in Druid's design. Column-oriented storage. Automatic bitmap indexing. Aggressive pre-aggregation. Horizontal partitioning by time.
The technical choices were not accidents. They were deliberate answers to a question most database vendors were not asking.
"Those problems at scale are incredibly difficult technical problems that take a group of data engineers a decade in order to do it well."
- Fangjin Yang, on why distributed analytics is still hardTimeline
What he's built
- Co-created Apache Druid, now governed by the Apache Software Foundation
- Founded Imply in 2015; scaled to $1.1B valuation by 2022
- Raised $215.3M in total funding across five rounds
- Grew Imply to $63M ARR with 77% year-over-year growth in 2024
- Built enterprise relationships with Netflix, Atlassian, Salesforce, and Confluent
- Named Datanami Person to Watch in 2023
- Active angel investor and a16z scout since 2021
- Published technical writing on O'Reilly on analytics stack design and Druid architecture
- Speaker at Data Council, O'Reilly, and enterprise data conferences worldwide
The details that don't fit the press release
His Twitter/X bio says "Aspiring rapper." He joined Twitter in June 2009 and has apparently kept the same bio through the entire journey from engineer to unicorn CEO.
Apache Druid was not planned as a product. It was built under deadline pressure at Metamarkets because no existing tool could handle millisecond analytics for programmatic ad bidding.
Yang open-sourced Druid before building a commercial product around it - a rare move driven by philosophical commitment to open source, not a go-to-market strategy.
His GitHub handle is simply "fjy" - just initials, no drama. The commit history goes back over a decade and follows the entire evolution of the Druid codebase.
He holds two engineering degrees from the University of Waterloo - BASc in Electrical Engineering, MASc in Computer Engineering - and now runs a billion-dollar SaaS company's sales, product, and strategy.
Yang became an a16z scout in 2021 while still actively running Imply - investing in other people's ideas while scaling his own. Classic operator move.
Still solving the same hard problem
Yang's driving question has not changed since 2011: why can you search all of human knowledge in 0.3 seconds, but asking your own database a simple analytical question takes minutes? The gap between search and analytics felt like a product failure. Druid and Imply are his answers.
The bet is that every company will eventually run applications where users need to explore data interactively - not just engineers, not just analysts, but thousands of users at once. Customer-facing analytics. Operational dashboards. Embedded insights. That use case requires a fundamentally different database architecture than what most companies run today.
Whether it's the next decade of Imply Polaris scaling cloud deployments or the open source Druid community building new connectors and features, Yang is positioned at the center of the real-time analytics stack. He was there when the problem was invented. He is still building the solution.
"In its early days, Druid was adopted for a set of use cases in a handful of industries. Today, developers have shown its applicability across all industries - and the use cases have expanded exponentially."- Fangjin Yang on the evolution of Apache Druid