Tagged Content
Everything on the platform tagged with reinforcement-learning.
AfterQuery is a San Francisco applied research lab that builds expert-generated datasets, benchmarks, and reinforcement-learning environments for the world's leading AI labs. The company recruits nearly 100,000 vetted domain experts - in finance, law, medicine, software, and beyond - to teach frontier models how specialists actually think.
Jason Warner is the co-founder and CEO of Poolside, a San Francisco-based frontier AI lab building proprietary foundation models for software development with $626M raised and a $3B+ valuation. Before Poolside, he was CTO of GitHub - where he launched Actions, Packages, Codespaces, and incubated what became GitHub Copilot - and then a Managing Director at Redpoint Ventures. A self-described 'average developer but excellent architect,' Warner is betting that reinforcement learning from code execution will make software the first domain where AI surpasses human-level intelligence.
Jonas Schneider is the Founder and CEO of Daedalus, an AI-powered precision manufacturing company headquartered in Karlsruhe, Germany, building the factory of the future. A former OpenAI technical lead and co-founder of its robotics team, Schneider left Silicon Valley in 2019 to solve a problem he lived firsthand: getting precision-manufactured parts takes months, and the world's most advanced machines sit idle 80% of the time. Daedalus deploys proprietary AI software across CNC shop floors to double machine utilization, catch defects in real time, and preserve tacit manufacturing knowledge before it disappears with retiring machinists. The company has raised $41.1M in total funding, including a $21M Series A led by NGP Capital in February 2024, and operates a 50,000-square-foot factory serving defense, medical devices, aerospace, and semiconductor clients.

Robert Nishihara is a co-founder of Anyscale, the company commercializing Ray - the open-source distributed computing framework powering AI workloads at OpenAI, Apple, Uber, and thousands of other organizations. A Harvard math grad and UC Berkeley PhD, he built Ray during his doctoral research to solve the tooling bottlenecks he personally experienced doing AI research. Anyscale has raised $259.6M and reached a $1B+ valuation under his leadership before he transitioned from CEO to a product-focused role in mid-2024.
Poolside is a San Francisco AI lab building foundation models purpose-built for software engineering. Founded in 2023 by former GitHub CTO Jason Warner and serial entrepreneur Eiso Kant, it trains its frontier models (malibu, point) with a technique called Reinforcement Learning from Code Execution Feedback - then deploys the whole stack inside customer environments for Global 2000 enterprises, financial institutions and the public sector.
Spencer Mateega is the 23-year-old Co-Founder and CEO of AfterQuery, a San Francisco-based applied research lab that captures expert professional knowledge and converts it into high-quality training data for AI foundation models. Founded in January 2025 and backed by Y Combinator's Winter 2025 cohort, AfterQuery raised a $30 million Series A at a $300 million valuation in April 2026, with revenues exceeding $100 million annualized. Mateega's philosophy — 'We teach machines how experts think' — drives a platform connecting roughly 100,000 domain professionals in finance, legal, and software to frontier AI labs hungry for reasoning-rich data.
Tim Shi is the co-founder and former CTO of Cresta, the AI platform for contact centers that grew to over $100M ARR and raised $401M from Sequoia, a16z, and Greylock. A Tsinghua CS graduate who did early AI research at OpenAI alongside Andrej Karpathy — including the 'World of Bits' paper on web-based reinforcement learning agents — he co-founded Cresta in 2017 with Zayd Enam after both dropped out of Stanford's AI PhD program. In 2025, he co-founded Recursive Superintelligence, which emerged from stealth with $650M at a $4.65B valuation to build self-improving AI systems.
William Ross is the CEO and Co-Founder of Federato, an AI-native insurance technology company that has raised $180M including a $100M Series D led by Goldman Sachs. A Stanford-trained researcher who published wildfire modeling research at NeurIPS 2021, Ross left a career spanning IBM Watson and Venrock to build what the industry calls a 'RiskOps' platform - bringing portfolio-level intelligence to daily underwriting decisions across the $1 trillion+ P&C and specialty insurance market. Federato's revenues tripled year-over-year and the platform now powers insurers managing hundreds of millions in gross written premiums.
Nicolai 'Nic' Ouporov is Co-Founder and CEO of Fleet AI, a startup building reinforcement learning training environments - or 'RL gyms' - that let AI agents practice operating real software tools like Salesforce and Excel before deployment. Fleet raised ~$45M total and grew from $1M to $60M+ annualized revenue in under a year. Nic is also a 3x YoungArts Award winner in photography and visual arts, a former pre-professional ballet dancer trained at Boston Ballet and San Francisco Ballet, and a QuestBridge Scholar from Columbia University. He previously served as Founding Engineer at Respell (acquired by Salesforce in 2024) and published AI research at Stanford's Robotics and Embodied AI Lab.

John Mern is the Co-Founder and CEO of Terra AI, a Khosla Ventures-backed startup using generative AI and probabilistic modeling to transform how humans find and develop critical minerals and energy resources underground. A Stanford PhD in aerospace engineering and alumnus of Boeing Phantom Works and KoBold Metals, Mern built Terra AI's platform to cut mine development timelines in half - running geophysical simulations 125,000x faster than traditional methods, achieving 40% reductions in drilling costs, and helping partners unlock over $100 million in investments. His work sits at the intersection of deep reinforcement learning, geoscience, and the urgent global race to secure the copper, lithium, and cobalt the energy transition demands.
Itzik Gilboa is the CEO of minds.ai, a Santa Cruz-based AI company applying multi-agent reinforcement learning to semiconductor fabrication. With aerospace and materials science degrees from the Technion and decades of experience at Cypress, SanDisk, and Western Digital, he bridges the gap between chip-floor pragmatism and cutting-edge AI research. Under his leadership, minds.ai secured a multi-year partnership with GlobalFoundries to deploy its Maestro(R) platform across global fab operations, aiming to reduce cycle times, improve on-time delivery, and help fabs run at peak efficiency in a world where every nanosecond of lost throughput costs millions.

Sir Demis Hassabis CBE FRS is a British AI researcher, neuroscientist, chess prodigy, and game designer who co-founded DeepMind in 2011 and serves as CEO of Google DeepMind. He won the 2024 Nobel Prize in Chemistry for AlphaFold's protein structure prediction, was knighted for services to AI, and is widely regarded as one of the most influential figures in the development of artificial general intelligence.

Wojciech Zaremba is a Polish-American AI researcher and co-founder of OpenAI, the company behind ChatGPT and GPT-4. A former International Mathematical Olympiad silver medalist from Kluczbork, Poland, he holds dual master's degrees from the University of Warsaw and École Polytechnique, and a PhD from NYU under Yann LeCun. At OpenAI he has led robotics (including the Dactyl hand that solved a Rubik's Cube one-handed), the Codex project powering GitHub Copilot, and the RLHF human feedback infrastructure that shaped ChatGPT's alignment. One of the few original co-founders still at the company after a decade, he is regarded as one of the most technically consequential and quietly influential figures in modern AI.

Nathan Lambert is a Senior Research Scientist and Post-Training Lead at the Allen Institute for AI (Ai2), where he leads open-source language model development on the OLMo and Tulu series. A UC Berkeley PhD, he previously led the RLHF team at Hugging Face, co-building the TRL library and the Zephyr model. He runs Interconnects AI, a Substack newsletter read by tens of thousands covering post-training, open models, and AI policy, and is the author of The RLHF Book (Manning Publications). With roughly 8,000 academic citations and a reputation for demystifying the hardest parts of modern AI, Lambert is one of the most trusted voices at the intersection of open-source AI research and public education.

Azalia Mirhoseini is an Iranian-born AI researcher, Stanford professor, and co-founder of Ricursive Intelligence - a frontier AI lab valued at $4 billion that uses AI to design better chips, which in turn train stronger AI. Best known for AlphaChip, the deep reinforcement learning system that now designs Google's TPUs and has compressed chip floorplanning from months to hours, she also co-invented the Mixture-of-Experts architecture underpinning GPT, Claude, and Gemini. With 20,000+ citations and a $335M-funded startup launched in under four months, she is closing the recursive loop between artificial intelligence and the hardware it runs on.