Reiner Pope

The Now

The Chip Architect Who Caught the Wave He Didn't See Coming

In November 2022, Reiner Pope packed up his desk at Google and walked out the door. His plan: build a chip company. What he didn't know - what nobody outside of OpenAI knew - was that in exactly one week, a chat interface called ChatGPT would detonate a demand for AI compute unlike anything the world had seen. Pope had left without knowing it was coming. The timing wasn't prescient. It was something more interesting: it was right.

Pope is the co-founder and CEO of MatX, a Mountain View semiconductor startup designing chips specifically for large language models. Not chips that can run LLMs. Chips engineered from the substrate up with one question in mind: what does physics actually allow here? MatX has now raised approximately $604 million, including a $500M Series B closed in February 2026 and led by Jane Street and Leopold Aschenbrenner's Situational Awareness LP - two of the more technically rigorous backers in finance.

The company employs over 110 people. It has a chip in development. It is not shipping yet - Pope has set a target of late 2027 for production systems. But in an industry where "challenger to Nvidia" is a phrase dozens of companies use, MatX is one of the few where the founders have spent years doing exactly the work they're claiming to redo better.

MatX at a Glance

$604M

Total Funding Raised

110+

Employees

2,000+

Tokens/sec (large MoE)

2027

Target Production Ships

The Technology

An Uncomfortable Trade-off, Solved in Silicon

The central problem in LLM hardware is something Pope describes as an "uncomfortable trade-off between latency and throughput." You can have chips with fast SRAM memory - low latency, great for serving one request at a time, but expensive and limited in size. Or you can have HBM memory - high capacity, good for throughput, but slower for individual requests. Everyone picks a side. MatX is trying to have both.

The MatX One chip uses a hybrid architecture: model weights live in SRAM for fast, low-latency access - achieving the snappy response times of designs from companies like Groq or Cerebras. The key-value caches, which grow with context length, live in HBM - supporting extended context windows without the capacity constraints of pure SRAM systems. The result is a chip that Pope claims offers higher throughput than any announced competing product while simultaneously matching the latency of SRAM-first designs.

MatX One - Hybrid Memory Architecture

SRAM Layer

Stores model weights. Ultra-low latency. Enables fast first-token response. Like Groq/Cerebras.

HBM Layer

Stores KV cache. High capacity. Supports long context windows without size limits.

+ splittable systolic array + fresh numerics + scale-out interconnect

Best latency + Best throughput

For transformer inference on large MoE and dense models - in a single chip

The chip is also built around a "splittable systolic array" - an architecture that retains the energy and area efficiency of large systolic arrays while achieving high utilization on smaller matrices with flexible shapes. Pope has also built custom scale-up and scale-out interconnects, and a programming model that gives hardware developers direct hardware control rather than abstracting it away.

What MatX deliberately doesn't target: small models, convolutional operations, recommendation systems. The focus is surgical. Frontier LLMs - training, reinforcement learning, prefill inference, decode inference. The labs spending the most on compute, trying to build the biggest systems. That's the customer.

"All of the frontier labs are spending tens of billions of dollars on compute. The rational choice is to do anything you can to get hardware costs down."

- Reiner Pope, CEO of MatX

The Path

From Haskell to TPUs to $600M

Pope grew up in Australia and studied mathematics at The Australian National University, graduating in 2011 with a Bachelor of Philosophy (Honours). He came into software through a route that most hardware engineers don't take: functional programming. He was, by his own description, a math enthusiast and Haskell programmer before he ever touched chip design.

That background - mathematical rigor, type-theoretic thinking, functional decomposition - runs through everything he's done since. Before Google, he spent years in software focusing on compilers, algorithms, and low-level systems. The kind of work where you're thinking carefully about what computations actually cost and why.

At Google, he joined Research as a Senior Staff Software Engineer in 2019 and spent roughly three years at the intersection of model training, hardware architecture, and compiler development. His role as Efficiency Lead for PaLM meant owning the software stack that made one of Google's most important language models actually run at production scale. He helped design Google's TPU v5e. He built what was at the time described as the world's fastest LLM inference software. And he co-authored the PaLM paper - which has gone on to accumulate over 9,400 Google Scholar citations.

Research

The Papers Behind the Product

PaLM: Scaling Language Modeling with Pathways

Journal of Machine Learning Research, Vol. 24 (240) - 2023

9,402 citations

PaLM 2 Technical Report

arXiv preprint arXiv:2305.10403 - 2023

2,459 citations

Efficiently Scaling Transformer Inference

Proceedings of Machine Learning and Systems, Vol. 5 - 2023

919 citations

SPIRe: Boosting LLM Inference Throughput with Speculative Decoding

arXiv preprint arXiv:2504.06419 - 2025

Co-authored at MatX

The Mind

First Principles as a Practice, Not a Slogan

Pope is one of a small number of people who has worked across the complete AI stack - from writing the software that runs on Google's TPUs to understanding the architectural decisions that shaped those chips in the first place. His co-founder Mike Gunter was the lead designer of the TPU hardware itself. Together, they cover every layer from silicon to model.

That depth shapes how he approaches the problem. He's written publicly about why frontier models are likely trained roughly 100x beyond Chinchilla-optimal compute - not because researchers are inefficient, but because when you account for the full lifecycle of a model (reinforcement learning, inference, continued fine-tuning), the economics look completely different from a naive "minimize training compute" framing. He deduces things about competitor systems by reading API pricing: long context getting expensive around 200K tokens means you're running into memory bandwidth walls. Prefill costing 5x less than decode means the systems are "tremendously memory bandwidth bottlenecked."

He's also, characteristically, still writing technical blog posts. From the MatX offices in Mountain View, he's publishing essays on cuckoo hashing improvements for SIMD hash tables, the conditions under which hashed sorting beats hash tables, whether Strassen matrix multiplications would be useful in AI if data movement were free, and why neural networks and cryptographic ciphers share deep structural similarities. The math compulsion hasn't been swapped out for CEO-speak.

"The best iteration is doing it in your head before writing code."

- Reiner Pope

"We left Google one week before ChatGPT was released. We did not know it was coming."

"Most of chip design is actually software development in practice."

"AIs are most effective when there is a well-defined objective function."

"You need to be on par on five things and far ahead on at least one."

The Business

Betting Against CUDA Lock-in

MatX's argument isn't just technical. It's economic. The frontier AI labs - the handful of companies genuinely pushing the limits of model scale - are spending tens of billions on compute annually. They're also deeply locked into Nvidia's CUDA ecosystem because switching costs are enormous and supply is constrained. Pope thinks that equation is changing.

The labs with the largest compute budgets, he argues, have the longest planning horizons - three to five years out - and the most to gain from hardware that's genuinely better for their workloads. They're not buying GPUs out of affection. They're buying what works. If MatX can deliver a chip that's definitively superior for LLM training and inference, the economics become compelling regardless of ecosystem switching costs.

The $500M Series B, announced in February 2026, is the funding needed to get through tapeout and into production. Jane Street - not a typical venture backer - led the round. Situational Awareness LP, founded by Leopold Aschenbrenner, who wrote one of the most-circulated memos on AGI timelines, also participated. Stripe co-founders Patrick and John Collison backed the round. Marvell Technology - an actual semiconductor company - also came in. This is not a round driven by people who don't understand what they're funding.

MatX Funding Rounds

Series A

~$104M

Series B

$500M

Jane Street Situational Awareness LP Marvell Technology Spark Capital NFDG Patrick Collison John Collison Nat Friedman Daniel Gross Triatomic Capital

Career Timeline

The Arc

2011

Graduated from The Australian National University with BPhil (Hons) in Mathematics

2013-2019

Software engineering focused on compilers, algorithms, and systems - building expertise in Haskell and low-level optimization

2019

Joined Google Research as Senior Staff Software Engineer; began work on LLM hardware/software efficiency and TPU architecture

2020-2022

Efficiency Lead for Google PaLM - designed Google's world-fastest LLM inference software; helped conceive the TPU v5e

Nov 2022

Left Google - co-founded MatX with Mike Gunter (former lead TPU hardware designer). ChatGPT launched one week later.

2023

PaLM paper accumulates 9,400+ citations; MatX raises Series A (~$104M) at $300M+ valuation

2025

Continues publishing technical research; co-authors SPIRe speculative decoding paper from MatX

Feb 2026

MatX closes $500M Series B - total funding reaches ~$604M. Manufacturing through TSMC, target ship date late 2027

Watch

Reiner Pope on Video

Designing AI Chips From First Principles for LLMs

Semi Doped Podcast

MatX: Accelerating AI with Transformer-Optimized Chips

Cheeky Pint

Achievements

What's Already on the Board

📜 Co-authored PaLM paper - one of the foundational large language model research papers, with 9,400+ citations in Google Scholar
💻 Designed and implemented Google's world-fastest LLM inference software as Efficiency Lead for PaLM
🧭 Helped conceive the Google TPU v5e chip architecture - the hardware powering much of Google's current AI infrastructure
📈 Co-founded MatX and raised $604M total including the largest single chip startup round of early 2026
📚 Co-authored the JAX scaling book (jax-ml.github.io/scaling-book/) - a reference for ML practitioners
👑 Built MatX from two founders to 110+ employees in under three years, targeting TSMC manufacturing at datacenter scale

The Person

Still a Mathematician at Heart

What's unusual about Pope among startup CEOs is the breadth of the intellectual range. He codes in Rust - attracted by the type system and manual memory control. He prefers iteration in his head over iteration in code. He publishes technical essays that have no obvious marketing value, because he's thinking through the problems anyway and writing it down.

Recent posts from his blog at reiner.org include an exploration of why cuckoo hashing improves SIMD hash tables, whether hashed sorting typically beats hash tables in practice, whether Strassen matrix multiplications could be useful in AI if data movement costs were removed from the equation, and a structural analysis of the similarity between neural networks and cryptographic ciphers. These aren't PR exercises. They're what happens when someone who thinks in mathematics runs a chip company.

He's also genuinely direct about what MatX doesn't know yet. Manufacturing at datacenter scale "remains the biggest hurdle" for the company. He gives production target dates that acknowledge the difficulty involved. The chip is designed. The question now is execution - the unglamorous, grinding work of taking something from a great design to something shipping at scale in datacenters.

Pope has described his approach to competition simply: be on par with the best on all the important dimensions, and be far ahead on at least one. In the chip business, that's a harder challenge than it sounds. The history of Nvidia challengers is a graveyard of companies that solved part of the problem and missed the rest. Pope and Gunter's advantage is that they have, between them, done the actual work at every layer of the stack. They know where the bodies are buried because they were there when the graves were dug.

Explore More

Reiner
Pope

The Chip Architect Who Caught the Wave He Didn't See Coming

An Uncomfortable Trade-off, Solved in Silicon

From Haskell to TPUs to $600M

The Papers Behind the Product

First Principles as a Practice, Not a Slogan

Betting Against CUDA Lock-in

The Arc

Reiner Pope on Video

What's Already on the Board

Still a Mathematician at Heart

Further Reading & Links

Sources & References

Quick facts: Reiner Pope

ReinerPope

The Chip Architect Who Caught the Wave He Didn't See Coming

An Uncomfortable Trade-off, Solved in Silicon

From Haskell to TPUs to $600M

The Papers Behind the Product

First Principles as a Practice, Not a Slogan

Betting Against CUDA Lock-in

The Arc

Reiner Pope on Video

What's Already on the Board

Still a Mathematician at Heart

Further Reading & Links

Sources & References

Quick facts: Reiner Pope

Reiner
Pope