Breaking
SPHINX RAISES $9.5M SEED LED BY LIGHTSPEED BACKERS INCLUDE STEVE COHEN & NAVEEN RAO KODIALAM: "THE DATA PEOPLE HAVE NOTHING" EX-CITADEL QUANT BUILDS A CURSOR FOR DATA SPHINX COPILOT SHIPS INTO JUPYTER & VSCODE SEVEN-PERSON TEAM OUT OF QUEENS, NEW YORK
Founder · Sphinx

Rohan
Kodialam

He left a quant seat on Wall Street because the data scientists, the ones doing the actual analysis, had nothing.

Co-Founder & CEO, Sphinx  //  New York

Rohan Kodialam, co-founder and CEO of Sphinx
The face of a man who thinks your spreadsheet deserves better AI.
$9.5M
Seed Round
2025
Year Founded
2
Co-Founders
MIT
Where It Started
The Pitch

A Cursor for the people who actually touch the data.

Rohan Kodialam noticed something on a trading floor that bothered him. The software engineers had Copilot finishing their code. The front-office traders had ChatGPT drafting their memos. And the data scientists, the ones turning raw numbers into decisions worth millions, had nothing built for them at all.

So in 2025 he and engineer Jamie Bloxham started Sphinx, an applied-AI research firm in Queens, New York, with a deliberately unglamorous mission: make AI good at data. Not language. Not code. Data. The messy, schema-less, half-documented stuff sitting in warehouses and Jupyter notebooks where nobody can quite remember what column 14 means.

Sphinx ships a copilot that lives inside Jupyter and VSCode, the places data teams already work. It autocompletes. It reasons. It explores hypotheses and hunts for the insight buried three joins deep. In September 2025 the company came out of stealth with $9.5 million in seed funding led by Lightspeed, with Bessemer Venture Partners, BoxGroup, K5 and Impatient Ventures along for the ride, plus angels who know a thing or two about data: Steve Cohen and Naveen Rao.

AI is driving a paradigm shift for natural language and code, but traditional data has been left behind. Rohan Kodialam, on why Sphinx exists
The Contrarian Bet

Code is a technical paper. Data is a poem.

Most founders chasing AI go where the demos look cleanest: chatbots, copilots for code, image generators. Kodialam went the other way, toward the part everyone else finds tedious. His argument is that data work and software work only look similar from a distance.

Code is literal. A function does what it says. Data is interpretive, ambiguous, full of context that lives in someone's head and nowhere else. He puts it as the difference between writing a poem and writing a technical paper. One has rules you can check. The other depends on what you meant.

That is exactly where today's language models stumble. "The moment you try to breach that boundary of language," he says, "you start to run into problems." Sphinx is his answer: agents trained with representation learning and reinforcement learning to interrogate data instead of hallucinating about it.

The Gap

Three tribes, one left out

Engineers got Copilot. Executives got ChatGPT. Data scientists got handed a generic chatbot and told to make it work.

The Build

Meet them where they live

Sphinx Copilot plugs into Jupyter and VSCode, with autocomplete and agentic reasoning rather than a new app to learn.

The Method

Representation + RL

Not prompt tricks. The bet is that teaching models the structure of data is the real unlock.

The Stakes

Decisions, not demos

Refined forecasts, optimized operations, insights that move a P&L. The boring, valuable end of AI.

"All the software engineers have Claude Code, all the front-office guys have ChatGPT, and the data people have nothing."

ROHAN KODIALAM · FAST COMPANY

The Long Way Here

From ski-rental algorithms to alpha generation.

Before the venture money and the launch posts, there was an MIT undergraduate in the Physics department, not Computer Science, who kept wandering into machine learning. As a SuperUROP scholar he worked on a delightfully specific problem: the classic "ski rental" dilemma, reframed for an age when a model can guess how long you'll keep skiing.

He stayed for a master's, joining MIT's Clinical ML group, where he co-authored research on predicting patient outcomes from time-series clinical data and a paper with the very good title "Deep Contextual Clinical Prediction with Reverse Distillation." At CSAIL his work turned to embedding complex hierarchical data into transformer architectures, teaching models to read the kind of structure that does not fit neatly into a sentence.

Then Wall Street. At Citadel he became a quantitative researcher specializing in alternative data, the satellite images and credit-card receipts that funds mine for edge. He went on to lead AI R&D building agentic models for alpha generation. It was there, surrounded by the best-tooled engineers and traders in the world, that the absence of anything for the data team became impossible to ignore.

Timeline
2018-2019
MIT SuperUROP scholar in Physics, advised by Alexander Rakhlin. Researches algorithms with machine-learned predictions.
2020
MEng with MIT's Clinical ML group. Publishes clinical prediction research, including "Reverse Distillation."
CSAIL
Embeds hierarchical data modalities into transformer architectures.
Citadel
Quant researcher on alternative data; later leads AI R&D for agentic alpha-generation models.
2025
Co-founds Sphinx with Jamie Bloxham. Becomes CEO.
Sep 2025
Sphinx launches from stealth with a $9.5M seed led by Lightspeed.
By The Numbers

A seven-person team, a heavyweight cap table.

Sphinx is small on purpose and well-backed by design. The $9.5M seed gives a tiny New York team the runway to chase a frontier that bigger labs have mostly skipped. The investor list reads like a who's-who of people who understand both AI infrastructure and what data is actually worth.

Seed raised
$9.5M
Team size
7 people
Lead investor
Lightspeed
Founded
2025
Lead

Lightspeed

Led the round.

Funds

Bessemer, BoxGroup, K5, Impatient

Institutional backing.

Angels

Steve Cohen

The Point72 founder and data-hungry investor.

Angels

Naveen Rao

AI hardware and infra veteran.

In His Words

Lines worth keeping.

"AI is driving a paradigm shift for natural language and code, but traditional data has been left behind."

"The moment you try to breach that boundary of language, you start to run into problems."

"It's almost like the difference between writing a poem and writing a technical paper."

"I aim to unlock new paradigms for intelligent systems to learn from data, and to drive lasting value through informed decision-making across industries."

Watch

Frontier AI agents for data science.

Kodialam sat down with the SuperDataScience podcast (episode 938) to make the case for agents that interpret data, explore hypotheses, and surface the insights humans miss. Described by the host as "an outstanding speaker building a revolutionary AI product."

SDS 938 on YouTubeFrontier AI Agents for Data Science
The data people have nothing. The four words that became a company
Margin Notes

Six things you didn't know.

1

His full name is Rohan Sundar Kodialam.

2

His MIT home department was Physics, not Computer Science.

3

Before quant finance he co-wrote clinical machine-learning papers predicting patient outcomes.

4

An early research project tackled the classic "ski rental" algorithm, upgraded with ML predictions.

5

He describes Sphinx as a "Cursor for data" that lives inside Jupyter and VSCode.

6

Sphinx launched as a seven-person team based in Queens, New York.

Spread the word

Software engineers got Copilot. Data scientists got Rohan.

Profile assembled from public sources: Fast Company, PR Newswire, BigDATAwire, SuperDataScience, Crunchbase, MIT SuperUROP & Clinical ML, and Sphinx's own launch post. Quotes reproduced as reported.