Gian Merlino

Profile

The Engineer Who Couldn't Find the Right Database - So He Built It

Gian Merlino arrives mid-stride. As CTO and co-founder of Imply, he runs the technical vision for the company that put Apache Druid - the open-source real-time analytics database he co-created - into a commercial product used by engineers and data teams at enterprises worldwide. Imply crossed the unicorn threshold in May 2022, raising $100 million in a Series D that pushed the company's valuation to $1.1 billion. But Merlino was writing code long before the term "unicorn" became useful to him.

His GitHub handle is @gianm. He has 167 followers, follows 4 people, and has over 1,000 commits to Apache Druid. For a co-founder of a billion-dollar company, that is a specific kind of discipline: staying close to the code even as the headcount grows, the funding rounds compound, and the company footprint expands to 180 employees across San Francisco and beyond.

Imply offers Druid as a managed cloud service and enterprise platform - making the same database architecture that powers real-time analytics at scale accessible without the painful operational burden of self-managing a distributed system. Annual recurring revenue stands at approximately $63 million, backed by investors including Andreessen Horowitz, Bessemer Venture Partners, and Thoma Bravo.

Imply Funding Snapshot

$215M

Total Raised

$100M

Series D (2022)

$1.1B

Valuation

$63M

Annual Revenue

180

Employees

The Problem That Became a Product

It was 2011. Merlino had joined Metamarkets Group as a data infrastructure engineer after a stint at Yahoo. Metamarkets was building real-time advertising analytics - dashboards that needed to query billions of events with hundreds of dimensions, served to a thousand users at once, with sub-second response times. Every existing database they evaluated failed on at least one of those requirements.

Merlino, Fangjin Yang, Eric Tschetter, and the Metamarkets team built Apache Druid to fill that gap. The core insight was architectural: instead of choosing between real-time ingestion and fast analytical queries, they designed a system that handled both simultaneously - with a columnar storage format, bitmap indexes, and a segmented time-series architecture that made it possible to query freshly ingested data without sacrificing query speed.

Apache Druid - By the Numbers

Co-created at Metamarkets in 2012. Open-sourced the same year. Donated to the Apache Software Foundation in 2018, where it became a top-level project. Merlino served as its inaugural PMC chair and remains among its most prolific contributors.

2012 Year Created

1,000+ Merlino's Commits

Top-Level Apache Project

<1 sec Target Query Time

The paper they published - "Druid: A Real-time Analytical Data Store" - drew attention from engineers across the industry who recognized the same problem in their own environments. The open-source release ignited a community. By 2015, when Merlino, Yang, and Vadim Ogievetsky (Chief Experience Officer) founded Imply, they weren't commercializing a theory. They were packaging a battle-tested system that the community was already running in production.

"You can never find a single perfect system. Think about data on a temperature-based spectrum to evaluate different approaches."

- Gian Merlino, ApacheCon @Home Keynote

From Caltech to Kafka Summits

Merlino graduated from the California Institute of Technology with a BS in Computer Science around 2007 - the same institution that has produced a remarkable density of rigorous, quantitatively precise engineers. From Caltech, he joined Yahoo as a senior software engineer, working on large-scale server infrastructure before moving to Metamarkets.

The career arc tells a consistent story: each move was toward harder distributed systems problems with higher scale requirements. Yahoo gave him the engineering fundamentals at internet scale. Metamarkets gave him the specific forcing function that produced Druid. Imply gave him the platform to turn that work into a company.

At Imply, his technical focus has been on the areas where Druid's architecture demands the most careful thinking: SQL query planning and optimization, multi-stage query execution (MSQ), hash-join optimization, memory efficiency, and the ingestion robustness that keeps high-volume pipelines stable under pressure. The multi-stage query engine - which he championed - fundamentally changed what was possible with SQL-based data ingestion in Druid, closing the gap between traditional data warehouse SQL and Druid's real-time capabilities.

"Now we're excited to show the world just how nimble it can be with the addition of multi-stage queries and SQL-based ingestion," Merlino said at the announcement of a major Druid open-source contribution in 2022. The commitment included a financial guarantee for Apache Druid users - a signal that Imply intended to deepen its investment in the community, not just extract from it.

* * *

Career Milestones

🎓

Caltech CS '07
B.S. Computer Science

⚙️

Yahoo Engineer
Server infrastructure at scale

📊

Druid Co-Creator
2012 at Metamarkets

🏛️

Inaugural PMC Chair
Apache Druid project

🦄

Unicorn Co-Founder
Imply - $1.1B valuation

🎤

Keynote Speaker
Current 2022 · ApacheCon · SREday

Running a Unicorn While Still Merging PRs

There is a version of the Gian Merlino story in which he transitions out of hands-on engineering as Imply scales - delegates the codebase, focuses on strategy and fundraising, becomes a talking-head CTO. That version hasn't materialized. The GitHub commit history tells a different story.

Merlino's @gianm contributions to Apache Druid span critical paths: the SQL planner built on Apache Calcite, the COALESCE and SEARCH function handling, filter optimizations that shave milliseconds from hot query paths, and system fields in input sources that enable more powerful ingestion patterns. These are not cosmetic contributions - they're the kind of diffs that require deep familiarity with how the system actually works under load.

In May 2025, Merlino stepped down as Apache Druid's inaugural PMC chair, handing the role to Abhishek Agarwal. Not because the project was struggling - but because he believed rotating leadership was the right signal for a mature, healthy open-source community. The distinction matters. Stepping down as a demonstration of governance health, rather than in response to failure, is rare enough to be worth noticing.

The broader philosophy that surfaces across his talks is anti-dogmatic: systems should be evaluated on a "temperature spectrum," not shoehorned into binary categories. Hot data (fresh, frequently queried) and cold data (archived, analytical) require different architectural choices. Merlino's mental model doesn't privilege one approach over another - it demands honest assessment of the actual workload.

* * *

Career Timeline

From Caltech to Unicorn CTO

2007
Graduates from Caltech with BS in Computer Science. Joins Yahoo as a senior software engineer.
2011
Joins Metamarkets Group, taking on data infrastructure challenges at advertising analytics scale.
2012
Co-creates Apache Druid with Fangjin Yang, Eric Tschetter, and colleagues. Publishes "Druid: A Real-time Analytical Data Store."
2014
Speaks at QCon San Francisco on hybrid batch/real-time data architectures. Druid gains industry traction.
2015
Co-founds Imply Data with Fangjin Yang (CEO) and Vadim Ogievetsky (CXO) to commercialize Apache Druid.
2018
Apache Druid donated to Apache Software Foundation. Merlino named inaugural PMC chair. Speaks at Strata Data Conference on Druid's SQL layer.
2021
Keynotes ApacheCon @Home. Articulates the "temperature spectrum" framework for evaluating data systems.
2022
Imply raises $100M Series D at $1.1B valuation. Announces major open-source Druid contribution with financial guarantee. Keynotes Current 2022 in Austin.
2025
Steps down as Apache Druid PMC chair (transitioning to Abhishek Agarwal) to model healthy open-source governance. Keynotes SREday San Francisco.

What He Says on Stage

Merlino speaks at the conferences where serious data infrastructure engineers gather: QCon, Strata, Current (formerly Kafka Summit), ApacheCon, O'Reilly Data. His talks don't follow the "here's why our technology is amazing" format. They tend to take a structural question - how should you think about hot vs. cold data? what can observability learn from BI? - and work through it analytically, often using Druid as an example of a specific design decision rather than a product demo.

His 2025 keynote at SREday San Francisco asked what the observability world could learn from the evolution of business intelligence: specifically, how decoupling data collection from querying - the BI playbook - might offer a model for observability at scale. It's the kind of cross-domain synthesis that's only possible if you've actually operated data systems under production conditions across different problem domains.

What Observability Can Learn From BI: Decoupling for Speed, Scale, and Flexibility

SREday San Francisco · October 2025 · Keynote
The Next Generation of Kafka Summit - Current 2022 Keynote

Current 2022 · Austin, TX · October 2022
ApacheCon @Home Thursday Keynote

ApacheCon @Home · 2021 · Keynote
The SQL Layer in Apache Druid

Strata Data Conference · 2018
Hybrid Batch/Real-time Data Architectures

QCon San Francisco · 2014 & 2015
Inside Apache Druid's Storage and Query Engine

Carnegie Mellon Database Group

▶

Watch on YouTube

ApacheCon @Home Keynote - Gian Merlino, Imply

* * *

Quick Takes

Numbers That Tell the Story

1K+

Personal commits to Apache Druid as @gianm on GitHub

2012

Year Druid was created - before "real-time analytics" was a category

$215M

Total raised by Imply across all funding rounds

People Gian follows on GitHub. Selective. Deliberate.

The Convergence Thesis

The underlying bet at Imply - and in Merlino's broader thinking - is that the traditional separation between OLTP and OLAP systems is collapsing. Real-time data ingestion is no longer exotic engineering. Sub-second query performance across billions of events is no longer reserved for companies with elite infrastructure teams. What Druid was built to handle at Metamarkets in 2011 is now a commodity expectation.

Imply's product direction reflects this: an elastic consumption platform with auto-scaling, multi-tenant architecture, serverless analytics, and integrations across cloud providers. The SQL-based ingestion and multi-stage query engine that Merlino helped ship fundamentally changed how teams can interact with Druid - reducing the operational complexity that made it difficult to adopt outside large engineering organizations.

The aspirational version of this story is a world where any organization - not just ones with dedicated data infrastructure teams - can query massive streams of event data at speed. Merlino has been building toward that version since 2012. The unicorn valuation is a data point in that direction, not the destination.

* * *

Find Gian Merlino

GianMerlino

The Engineer Who Couldn't Find the Right Database - So He Built It

The Problem That Became a Product

Apache Druid - By the Numbers

From Caltech to Kafka Summits

Career Milestones

Running a Unicorn While Still Merging PRs

From Caltech to Unicorn CTO

What He Says on Stage

Numbers That Tell the Story

The Convergence Thesis

Share This Profile

Gian
Merlino