The man who taught robots to solve Rubik's cubes - and AI teams to actually ship.
Researcher. Founder. Educator. Josh Tobin spent three years inside OpenAI, earned a PhD from UC Berkeley, raised $28M to solve the hardest problem in production ML, and built the course that taught a generation of engineers how to deploy models that don't fall apart the moment they meet real users.
Josh Tobin - Researcher, Founder, Educator
Josh Tobin does not fit one box. He was a management consultant at McKinsey before he was an AI researcher. He was an AI researcher at OpenAI before he was a founder. He was a founder before he was an educator. He is, at every stage, someone who showed up to the hardest version of a problem and did something useful with it.
Today, Tobin's name is most closely associated with the gap between AI research and production reality - the messy, expensive, often-ignored middle ground where models trained in comfortable labs meet users, data drift, and edge cases that no lab ever prepared them for. His newsletter, his teaching, and his startup all orbit the same problem: most ML teams can't see what's happening to their models once they ship. That invisibility is not an inconvenience. It's a product crisis.
The trajectory began at UC Berkeley, where Tobin did his PhD under Pieter Abbeel - one of the most respected names in robot learning. His dissertation, "Real-World Robotic Perception and Control Using Synthetic Data," was not an abstract exercise. It was a direct attack on a fundamental challenge: how do you train a neural network in simulation and trust it to work in the real world, where lighting is different, objects wobble, and the physics engine wasn't tuned by a committee?
His answer was domain randomization. Train the model in dozens, then hundreds, of randomized simulated environments - vary the textures, the lighting, the masses, the friction. Force the network to learn features that survive variation. Then deploy it into reality. The 2017 paper introducing this technique has now been cited over 600 times. It changed how robotics teams think about perception.
The probably number one lesson of machine learning is you get what you optimize for. If you can set up the system to optimize directly for the outcome you're looking for, the results are going to be much, much better.
- Josh TobinWhile finishing his PhD, Tobin was simultaneously a research scientist at OpenAI - a three-year overlap that put him at the center of some of the most technically ambitious robotics work anywhere in the world. The most visible output was the dexterous manipulation project: a robot hand trained in simulation that learned to solve a Rubik's cube. The video circulated across the ML world in 2018. Behind it was a pile of reinforcement learning, generative modeling, and the domain randomization techniques Tobin had been developing.
By 2020, Tobin had a new problem to solve. He and Vicki Cheung - who had headed infrastructure at OpenAI and was a founding engineer at Duolingo - co-founded Gantry. The premise was straightforward and underaddressed: most teams had no visibility into what their deployed models were doing. They couldn't tell if a model was drifting, returning garbage on edge cases, or silently degrading as the world changed around it. Gantry was built to close that gap - instrumenting ML applications, visualizing performance, and giving teams the tools to decide when and how to retrain.
The company raised a $4.4M seed round and a $23.9M Series A led by Amplify Partners and Coatue. Greg Brockman, co-founder of OpenAI, was among the notable angels. Continual learning was the central concept: the idea that a model trained offline is not a finished product, it's a starting point. The world moves. Users change. Models need to keep up.
The quality of the data that you put into the model is probably the biggest determining factor in the quality of the model you get on the other side.
- Josh TobinParallel to his research and company work, Tobin built something rarer: a genuinely useful educational program. Full Stack Deep Learning started as a course at UC Berkeley and became the first focused treatment of production ML engineering - the unglamorous discipline of getting models from notebooks into systems that run, monitor themselves, and don't embarrass you at 2am. The 2022 iteration refocused on building ML-powered products and included a bootcamp for large language models before LLM bootcamps were a genre.
There's a pattern here. Tobin consistently shows up a step ahead of the problem. Domain randomization before sim-to-real was mainstream. ML monitoring before MLOps was a job title. LLM engineering before every conference had a track for it. This is not a coincidence. It is the product of a particular kind of mind: someone who consults before they research, who researches before they build, who builds before they teach, and who teaches because they believe the field moves faster when more people understand the fundamentals.
His pre-PhD stint at McKinsey - unusual for a deep learning researcher - shows up in how he frames problems. Not "what is technically possible" but "what does the team actually need to make a decision." Not "can we train a better model" but "what does the organization need to trust it." That business-aware framing runs through his writing, his courses, and the product choices at Gantry.
The Instagram handle @joshingtobin - a pun on the slang for joking - and his association with the Upright Citizens Brigade comedy community suggest that behind the papers and the pitch decks, there's someone who takes the work seriously without taking himself too seriously. His newsletter, focused on ML infrastructure and ops, has found an audience among engineers who are done with theory and want to know what actually breaks in production.
What Tobin has built, across a career that defies neat categorization, is a set of tools - technical, educational, organizational - for the space between cutting-edge research and functioning systems. That space is hard. Most people who know how to live in it don't explain it well. Most people who explain it well haven't lived in it. Tobin has done both.
Models trained offline won't perform well for long in production because user behavior and world context changes continuously.
- Josh Tobin on Continual LearningTrain in chaos, deploy in reality. Tobin's 2017 technique randomizes simulated environments - textures, lighting, physics - to force neural networks to learn features that survive the messiness of the real world. 600+ citations later, it's standard in robotics research.
A model trained offline is not a finished product - it's a starting point. The world moves. Users change. Data drifts. Gantry was built on the premise that production AI needs infrastructure for ongoing improvement, not just one-time deployment.
Full Stack Deep Learning emerged from a simple observation: there was no serious curriculum for the work that happens after the model trains. Deploying, monitoring, debugging, retraining - the discipline that makes AI products function at scale.
Before architecture choices, before hyperparameter tuning, before any of the glamorous decisions - data quality is the biggest determinant of model quality. Tobin returns to this principle across his research, teaching, and products.
You get what you optimize for. The cardinal sin of production ML is optimizing for a proxy metric while the actual outcome drifts. Tobin's work keeps returning to the challenge of building systems that optimize for what you actually care about.
The dream of robotics: train cheaply in simulation, deploy in reality. Tobin made it work for deep neural networks. The insight was to make simulation deliberately imperfect - variable enough that the real world doesn't come as a surprise.
His Instagram handle is @joshingtobin - a pun on "just joshing" (joking). One of the more self-aware social handles in ML research.
Before becoming one of the most-cited names in sim-to-real robotics, he was a management consultant at McKinsey. The pivot is rare - and it shows in how practically he frames technical problems.
He helped engineer a robot hand that solved a Rubik's cube - a project so technically demanding it became a landmark demo for what reinforcement learning trained in simulation could achieve in the physical world.
Gantry's co-founder Vicki Cheung was both a founding engineer at Duolingo and the head of infrastructure at OpenAI. The founding team was arguably the strongest ML infrastructure pair anywhere at the time.
Full Stack Deep Learning launched a large language model bootcamp before "LLM bootcamp" was a recognizable category. Tobin's timing - in research, products, and education - consistently runs slightly ahead of the field.
Associated with the Upright Citizens Brigade comedy community - which may explain why his educational content is unusually clear and human compared to most technical teaching in the ML space.