Chang She

$30M

Series A / June 2025

20M+

Lance downloads

~2006

Early pandas user

W22

Y Combinator batch

The Story

Why images are still harder than spreadsheets

A question that turned into a company

Ask Chang She what LanceDB is really about and you get a complaint disguised as a mission: working with a table of numbers is easy, and working with embeddings, images, and video is not. Decades into the data era, that gap never closed. His whole company is an argument that it should.

Today he is co-founder and CEO of LanceDB, the company behind the open-source Lance columnar format and a database built for multimodal AI. Its users are the names attached to the current wave of generative models. Runway, Midjourney, and Character.ai store and search their training data on Lance, some of them across petabytes and tens of billions of vectors. In June 2025 the company raised a $30M Series A led by Theory Ventures, with CRV, Y Combinator, Databricks Ventures, and Runway among the backers.

The pitch is blunt about what it wants to unseat. "None of that really fit into the traditional data stack: Pandas, Spark, Parquet, even Arrow," She has said. That is a striking sentence coming from him, because he helped build the first name on that list.

The Multimodal Lakehouse

Alongside the Series A, LanceDB launched what She calls the Multimodal Lakehouse: one platform meant to hold every kind of AI data - text, embeddings, images, audio, video - and to serve every workload against it, from semantic search to feature engineering to model training. His objection to the current wave of tooling is that it is too small. "Vector databases tend to be very narrow solutions for a very narrow problem," he says. He is not trying to add a feature. He is trying to move the floor.

The reason is scale, and his favorite line for it has stuck: "We often say a trillion is the new billion. We have folks operating at roughly a thousand times the scale they were at just a year or two ago." AI agents, in his telling, will use however much capacity you hand them, so the honest move is to build for a size that looks absurd today.

Before the lakehouse: a builder who never quite left the data behind.

“In five years, ‘multimodal’ won’t even be a word anymore. It’ll just be data.”

— Chang She

Origins

Two roommates and a library

AQR, 2006, before "data scientist" was a job title

The origin is almost too neat. Around 2006, Chang She was a research associate at the hedge fund AQR Capital Management. His colleague and roommate was Wes McKinney. "At that time, data scientist wasn't really a job title," She recalls. One day McKinney walked over with something he had been building in Python and said, in effect, look at this.

That something became pandas. She was one of its earliest users, evangelizing it inside the firm before it was even open-sourced, and he is credited as a co-author of the library that a generation of analysts would later learn on. It is a rare thing to be present at the birth of a foundational tool. It is rarer still to spend the next twenty years trying to build the thing that comes after it.

He came to code sideways. At MIT he took degrees in electrical engineering and computer science and in political science. He started his working life as, in his own words, "a former quant researcher-trader turned developer of data science platforms and tools." The math training left a fingerprint on how he thinks: "In math, it's like you always try to reduce a problem to a previously known or solved state."

2001–06

MIT

Degrees in EECS and political science, including a master's in EECS.

2006–11

AQR Capital

Research associate. Roommate Wes McKinney shows him an early pandas.

2012

Lambda Foundry

Co-founds the commercial vehicle built around pandas.

2013–14

DataPad

Co-founder and CTO with McKinney as CEO. Acquired by Cloudera in 2014.

2014–18

Cloudera

Engineering manager, leading Cloudera Navigator.

2018–21

Tubi

VP of Engineering. Recommenders, ML-ops, experimentation - and the multimodal wall.

2022

LanceDB

Co-founds with CTO Lei Xu. Y Combinator W22.

2025

Series A

$30M led by Theory Ventures. Multimodal Lakehouse ships.

The Turn

The wall he hit at a streaming service

Where the idea for LanceDB actually came from

Between pandas and LanceDB there was a decade of shipping other people's infrastructure. DataPad, the Python-stack data product he co-founded with McKinney, was acquired by Cloudera in 2014. He stayed on to run engineering for Cloudera Navigator. Then came Tubi, the ad-supported streaming service, where he was VP of Engineering.

"I was VP of Engineering at Tubi, where I built a lot of the recommendation systems, ML-ops systems, and experimentation systems," he says. That is where the abstract complaint became a daily one. Recommenders live on embeddings. Streaming lives on video, images, audio, and subtitles. And every one of those objects had to be bent, awkwardly, into tools designed for rows and columns.

He had spent his career inside the tabular data stack. At Tubi he watched it fail to hold the shape of the data he actually had. The response, in 2022, was LanceDB, co-founded with Lei Xu, a former core contributor to HDFS who had led ML infrastructure at Cruise. They went through Y Combinator's W22 batch and launched the product publicly on May 1, 2023.

The wager underneath it: enough people were hitting the same wall that a better foundation would spread on its own. It did. By June 2025 the open-source project reported more than 20 million downloads, and the customer list had filled with the labs defining generative AI.

Why is working with embeddings, images, and video still so difficult, when compared to tabular data?

I've been building data and machine learning tooling for almost two decades at this point.

In His Words

The Chang She file

Lines from talks, interviews, and blog posts

Vector databases tend to be very narrow solutions for a very narrow problem.

A trillion is the new billion. Folks are operating at roughly a thousand times the scale they were at a year or two ago.

None of that really fit into the traditional data stack: Pandas, Spark, Parquet, even Arrow.

In math, you always try to reduce a problem to a previously known or solved state.

The Margins

Quirks, puns, and a forked keyboard

The details that do not fit in a pitch deck

Genghis, sort of

His handle is @changhiskhan - a pun on Genghis Khan that has quietly outlasted several of his companies.

Keyboard tinkerer

Among his GitHub repos sits a fork of an Advantage360 ergonomic mechanical keyboard config. The hardware itch runs deep.

Deadpan on the timeline

He posts satirical engineering jokes on X, once riffing that LanceDB should be rewritten in assembly for "bare metal performance."

Holds degrees in both computer science and political science from MIT.
Was an early pandas user before it was open-sourced, after McKinney showed it to him at AQR.
His GitHub carries an Arctic Code Vault Contributor badge from years of open-source work.
LanceDB customers search tens of billions of vectors and store petabytes of training data.

The Method

How a format quietly wins

Twenty million downloads without a shout

LanceDB did not arrive with a marketing blitz. It arrived as a file format and a library that solved a specific pain well enough that engineers passed it along. That is a strategy She has run before. pandas spread the same way, as did the open-source projects he shipped in the years between. Give people something that removes friction from their day and the adoption curve takes care of itself.

The pattern shows up in the numbers. By June 2025 the open-source project had crossed 20 million downloads, and the paying customers were not logos chasing a trend but teams operating at the edge of what the old tooling could bear. When Midjourney, Runway, and Character.ai standardize on your storage layer, it is because the alternative broke first.

His diagnosis of the industry's development experience is unsentimental. Machine learning engineers, he argues, are "often stuck with a subpar development experience," and "AI teams are spending most of their time dealing with low-level data infrastructure details." Lance is his attempt to hand those hours back. The Series A investor list reads like a vote from people who know the terrain: Theory Ventures led, with CRV, Y Combinator, Databricks Ventures, and Runway alongside, and angels including his old collaborator Wes McKinney.

There is a lesson buried in the DataPad chapter that seems to guide him still. A polished product can be acquired and absorbed; a foundational format embeds itself into how an entire field works. The second time around, he built the format first.

“AI teams are spending most of their time dealing with low-level data infrastructure details.”

— Chang She, TechCrunch, 2024

Adopted at scale

Runway. Midjourney. Character.ai. Petabytes of training data, tens of billions of vectors, one open format underneath.

The Aspiration

Building for a size that looks absurd today

She's stated mission is to build "the most efficient and scalable data platform for AI applications." The bet is that the roughly twenty-year-old stack - the one he helped start - is buckling under AI workloads, and that narrow vector databases are a patch, not an answer. The fix he wants is a single lakehouse that holds every data type and serves every workload, batch and real-time, from search to training.

There is a tidy symmetry in it. He was in the room when pandas made messy data feel manageable for a generation of analysts. Now he is trying to do the same for the messier, richer data that machines learn from. Same instinct, larger canvas.

Watch

Chang She, on the record

Talks and interviews on YouTube

▶Why the 20-year-old data stack is breaking under AI workloads ▶Scaling AI Data Infrastructure: A Multimodal Approach ▶Why He Walked Away From Parquet to Build LanceDB ▶LanceDB's Origin: Willing to Build Another AI Tool

The Rolodex

Find Chang She

Verified public profiles and pages

LanceDB · Website→ LinkedIn→ X / Twitter→ GitHub→ Medium→ @lancedb→ LanceDB on GitHub→ Series A Announcement→ Y Combinator Profile→

Chang SheHe co-wrote the tool a generation learned data on. Then he set out to replace the ground it stands on.