Robert Nishihara

The Story

The Grad Student Who Couldn't Stop Being Annoyed

In a UC Berkeley computer science lab sometime around 2016, Robert Nishihara kept running into the same wall. He was doing AI research - interesting work, the kind that could matter - but the actual research kept getting displaced by something more tedious: managing clusters, shuffling data, wrestling with distributed systems that weren't built for machine learning. He wasn't writing AI. He was wrangling infrastructure.

So he and his labmates - Philipp Moritz and advisor Ion Stoica - built something to fix that. They called it Ray. It was meant to be a research tool. It became the distributed computing engine that now runs inside OpenAI, Apple, Uber, Amazon, Pinterest, Canva, and Instacart. Instacart used it to train models on 100x more data. Canva cut cloud costs in half.

"We were doing AI research, but we were bottlenecked by the tooling. We found ourselves spending all of our time managing clusters, wrangling data, and solving distributed systems challenges." - Robert Nishihara

The Ray paper landed at USENIX OSDI 2018, one of the top systems conferences in computer science. It described a framework capable of handling over 1.8 million tasks per second. By then, the vision had expanded: what if you could make distributed computing invisible to the developer? What if Python programmers could just write Python, and the scaling happened underneath?

In 2019, Nishihara, Moritz, and Stoica left the lab to start Anyscale - the commercial company built around Ray. The timing was either very early or perfectly calibrated, depending on how you look at it. The AI infrastructure wave that now looks obvious wasn't obvious at all when they started. They were building for a future they had to make people believe in.

It worked. Anyscale raised $259.6M across multiple rounds, reaching a $1B+ valuation. But the more interesting inflection point came in July 2024, when Nishihara stepped back from the CEO seat - not under pressure, but by suggestion. At a board meeting roughly six months earlier, he had told the board that Anyscale needed a different kind of leader to push toward the next revenue milestone. Keerti Melkote, founder of Aruba Networks, became CEO. Nishihara moved into a product-focused role.

That move - a founder proactively nominating himself out of the top job because the company needed it - is not common. It's the kind of self-awareness that gets taught in business school but rarely practiced. For Nishihara, it was just arithmetic: he cared more about what Anyscale built than about what his title said.

Now he watches the AI compute landscape shift in real time. His thesis, expressed in a 2025 post on X: the era where compute primarily means training is ending. The next era spends compute on data - generation, annotation, curation. "I expect to see the money & compute spent on data processing grow to match and exceed the money on pre-training," he wrote. He's also watching reinforcement learning closely enough to hire aggressively for it at Anyscale.

He grew up in the Bay Area, skateboarding and thinking about mathematics - the kind of kid who found uncountable infinities genuinely interesting rather than abstractly impressive. He went to Harvard for math, then Berkeley for a PhD in computer science, working under Michael Jordan and Ion Stoica in the AMPLab and later the RISELab. Along the way he interned at Google, the NSA, Facebook, Microsoft Research, and Jane Street - five very different organizations that probably gave him five very different definitions of scale. He completed the SF Half Marathon and wrote about it like a person who had surprised himself: "A goal that once felt out of reach is now complete."

The thing Ray did - and the thing Anyscale continues to try to do - is collapse the distance between an idea in a researcher's head and code actually running across a cluster. That gap, invisible to most people, used to eat weeks of a researcher's time. Nishihara built infrastructure so that other people could think instead of wrangle. He's still at it.

Realizing AI's potential requires making capabilities incredibly reliable in the real world, and building out the underlying hardware and software infrastructure to enable them throughout every industry.

- Robert Nishihara

The Technology

Ray: Infrastructure for the AI Age

Ray is the open-source project that put Nishihara on the map - and kept him there. Here's what it actually does, and why it matters.

Core Framework

Ray Core

A general-purpose distributed computing framework for Python. Write your code normally; Ray handles distributing it across hundreds of machines. No rewrites, no infrastructure wrestling.

Training at Scale

Ray Train

Distributed training for ML models. Enables training across GPU clusters with fault tolerance built in. Instacart used it to train on 100x more data than before.

Hyperparameter Tuning

Ray Tune

Scalable hyperparameter optimization. Runs thousands of experiments in parallel across a cluster, making what used to take weeks take hours.

Model Serving

Ray Serve

Production model serving at scale. Handles real-time inference for LLMs and other models with auto-scaling, batching, and multi-model pipeline support.

Data Processing

Ray Data

Scalable data loading and transformation pipelines for ML workloads. Integrates with Ray Train and Ray Tune for end-to-end ML pipelines.

Reinforcement Learning

RLlib

A composable and scalable RL library built on Ray. One of the foundational papers co-authored by Nishihara during his PhD research at Berkeley.

Scale & Impact

What Ray Actually Does in Production

100x

More training data for Instacart using Ray Train

50%

Cloud cost reduction at Canva after adopting Ray

1.8M+

Tasks per second - demonstrated Ray throughput

1,000+

Open-source contributors to the Ray ecosystem

Users include

OpenAI Apple Uber Amazon Pinterest Canva Instacart

Career Arc

From Berkeley Lab to $1B+ Startup

2009-2013

BA in Mathematics, Harvard University. Where the affinity for rigor and abstraction took formal shape.

2013-2019

PhD in Computer Science, UC Berkeley. AMPLab then RISELab. Advisors: Michael Jordan, Ion Stoica. Where Ray was born out of research frustration.

2017

Ray paper submitted to arXiv (December). Co-authored with Philipp Moritz, Ion Stoica, and others. Demos scaling to millions of tasks per second.

2018

Ray presented at USENIX OSDI 2018. Gains major recognition in the systems research community.

2019

Co-founds Anyscale with Philipp Moritz and Ion Stoica to commercialize Ray. The first company built specifically to make distributed AI infrastructure accessible.

2022

Anyscale raises $99M Series C led by Addition and Intel Capital. Ray 2.0 announced at Ray Summit 2022. Valuation crosses $1B.

2024

Proactively recommends his own CEO transition. Keerti Melkote (founder, Aruba Networks) named CEO. Nishihara moves to product-focused co-founder role.

2025-2026

Active on AI infrastructure frontier. Speaks at KubeCon, PyTorch Conference, HumanX. Anyscale ARR ~$111.9M. Pushing hard on RL and data-centric AI compute.

🎓

PhD, Computer Science

UC Berkeley | AMPLab / RISELab

2013 - 2019 | Advisors: Michael Jordan, Ion Stoica

📐

BA, Mathematics

Harvard University

2009 - 2013

In His Own Words

What He's Thinking About

We're entering an era where compute isn't primarily for training. It's for creating better data. I expect to see the money & compute spent on data processing (generation/annotation/curation) grow to match and exceed the money on pre-training.

Twitter/X, 2025

We're missing techniques for 'training-time reasoning.' Right now there's a lot of progress on inference-time reasoning, but when I think about how I learn, e.g., when reading a technical paper, it's very compute intensive.

Twitter/X

We were doing AI research, but we were bottlenecked by the tooling. We found ourselves spending all of our time managing clusters, wrangling data, and solving distributed systems challenges.

On why Ray was built

Reinforcement learning is a big investment area for us at Anyscale, and we're hiring actively for RL!

Twitter/X, 2025

Character Study

Six Things Worth Knowing

He grew up skateboarding in the San Francisco Bay Area and spent free time thinking about uncountable infinities. Two hobbies that don't obviously intersect - but both require patience with the incomprehensible.

His professional email has been rkn@ since his grad school days - that's the handle that stuck from Berkeley all the way to a $1B+ company. Not "robert." Not "nishihara." Just rkn.

He interned at five organizations before starting Anyscale: Google, the NSA, Facebook, Microsoft Research, and Jane Street. Software, security, social, research, and finance - in that order.

The Ray paper (arXiv:1712.05889) was submitted in December 2017 - when Nishihara was still a PhD student. The framework was already running at production scale before Anyscale existed.

He completed the SF Half Marathon and wrote about it with the energy of someone genuinely surprised they'd done it: "A goal that once felt out of reach is now complete."

At a board meeting in early 2024, he told the board that Anyscale needed a more experienced CEO to drive revenue. He then helped hire his own replacement. Keerti Melkote, founder of Aruba Networks, took the role in July 2024.

What's Next

The Bets He's Making Now

The clearest window into Nishihara's current thinking is his Twitter/X feed, where he posts with the precision of a researcher and the frequency of someone genuinely excited by the problem in front of him.

The main thesis: the AI compute paradigm is shifting from training to data. Pre-training dominated the first decade of the deep learning era. What comes next, in his view, is a massive buildout of compute dedicated to generating, curating, and annotating data. Synthetic data at scale. The infrastructure for that doesn't quite exist yet.

He's also bullish on reinforcement learning in ways that go beyond the current RL-from-human-feedback (RLHF) conversation. His interest seems to be in RL as a general learning paradigm - something closer to how humans actually learn when working through complex material, which he describes as "compute intensive" in ways that current training doesn't capture.

And he's watching the emergence of specialized inference engines closely - vLLM, SGLang, TensorRT-LLM - publishing observations on why LLM inference is structurally different from traditional model serving, and what that means for the infrastructure layer.

Anyscale's product roadmap, which he now owns, is presumably chasing all three threads simultaneously. The Ray ecosystem - open-source, 1,000+ contributors, 10,000+ organizations - gives him an unusually large antenna for where the industry is actually going.