The Architect Behind the World's Fastest Supercomputer Networks
Somewhere inside the machines running the most advanced AI models ever built, packets are racing through switches at 800 gigabits per second. Gilad Shainer designed the protocols that make that speed usable.
At NVIDIA, Shainer holds the title of Senior Vice President of Networking - but the scope of his work runs considerably deeper than the org chart suggests. When NVIDIA CEO Jensen Huang talks about "AI factories," those factories need wiring. Shainer is the person deciding what that wiring looks like, what speeds it runs at, and what software stack orchestrates the billions of messages that pass through it every second.
More than half the world's top 500 supercomputers run on NVIDIA's Quantum InfiniBand networking - a fact Shainer cited live at ISC 2025 in Hamburg. That isn't a coincidence of market share. It reflects two decades of architectural decisions, protocol design, and ecosystem building that Shainer has been accumulating since his days as a design engineer at Mellanox Technologies in 2001.
Mellanox - the Israeli networking company that pioneered InfiniBand as a serious high-performance interconnect - was acquired by NVIDIA for $6.9 billion in 2020. The acquisition was, in many ways, an acknowledgment that AI training at scale has the same demands as supercomputing: low latency, high bandwidth, zero jitter, and the ability to synchronize thousands of processors without losing a single clock cycle. Shainer had spent 19 years proving that was possible. He joined NVIDIA with the architecture already built.
"Today, the scale of the data center has shifted. In the era of AI, the data center itself has become the unit of computing. Instead of asking 'How many CPUs can I buy?' the question is now, 'How do I design a data center capable of running my workloads at maximum efficiency?'"
- Gilad Shainer, NVIDIA SVP NetworkingFrom an Artificial Pancreas to 800Gb/s Switches
Shainer completed both his B.Sc. and M.Sc. in Electrical Engineering at the Technion - Israel Institute of Technology - graduating Cum Laude both times. Technion is roughly what MIT is to Cambridge, Massachusetts: it sits near the top of every global engineering ranking, and its graduates have an outsized share of Israel's technology exports.
During his master's research, Shainer worked on a project that had nothing to do with networking. The Artificial Pancreas initiative was an early biomedical engineering project exploring closed-loop insulin delivery systems. It was interdisciplinary work with real stakes - the kind of research that trains an engineer to think in terms of systems under load, feedback loops, and latency consequences. That instinct would translate, in unexpected ways, to the world of high-performance interconnects.
After completing his M.Sc. in 2001, Shainer joined Mellanox Technologies - then a small Israeli startup in Yokneam that was betting the company on a then-nascent interconnect standard called InfiniBand. The bet looked questionable at first. InfiniBand faced fierce competition from competing standards, skeptical hyperscalers, and an industry that largely preferred Ethernet because it was familiar. Mellanox - and Shainer - spent the better part of a decade proving the skeptics wrong one supercomputer cluster at a time.
By the time Shainer rose to VP of Marketing in 2013, InfiniBand was the dominant fabric for HPC. The TOP500 list - the ranking of the world's most powerful supercomputers - was increasingly populated by machines wired with Mellanox's technology. The technical advantages were real: RDMA (Remote Direct Memory Access) let processors exchange data directly without CPU involvement, slashing latency. Adaptive routing prevented congestion hotspots. And a series of innovations that Shainer helped shepherd - including CORE-Direct collective offload and the SHARP protocol for in-network aggregation - moved computation itself inside the switch fabric, reducing the amount of data that had to travel across the network at all.
The Oscars of Innovation
"InfiniBand was designed from the ground up for synchronous, high-performance computing - with features like RDMA to bypass CPU jitter, adaptive routing, and congestion control. It's the gold standard for AI training at scale."
- Gilad Shainer400 Organizations. One Mission. Built from Scratch.
In 2008, Shainer founded the HPC-AI Advisory Council. At the time, it was an idea: a community where HPC practitioners from industry, academia, and government could exchange knowledge, benchmark tools, and push the field forward. There was no membership fee logic. No venture backing. Just a conviction that the HPC community was undersupported and that information was siloed in ways that hurt everyone.
Seventeen years later, the council has more than 400 member organizations. It runs global workshops, produces benchmarking resources, and has become one of the primary networking venues for the people who build and operate the world's most demanding compute infrastructure. When AI workloads started displacing traditional HPC simulations on the same hardware, the council adapted - adding AI track programming before most organizations had realized the convergence was real.
Around the same time Shainer was building the council, he co-founded the ISC Student Cluster Competition - a global competition that puts student teams inside real HPC cluster hardware, challenging them to achieve the highest performance on a mix of real-world scientific workloads. The competition is now in its 13th year. Shainer still shows up. He is on the advisory board of the Winter Classic Invitational, a similar student competition, and runs the APAC HPC-AI University Competition annually. The pattern is not accidental. He genuinely believes that the future of the field runs through the people coming up now, and acts accordingly.
When the Data Center Becomes the Computer
There is a moment in every technology transition when the unit of analysis shifts. For decades, it was the processor. Then the server. Then the rack. Now, in Shainer's framing - which is increasingly also Jensen Huang's framing - the unit is the data center itself.
AI training at scale does not distribute workloads the way traditional cloud computing does. A GPT-scale training run does not tolerate stragglers. If one GPU in a cluster of ten thousand slows down, every other GPU waits. This is why the network is not infrastructure in the traditional sense - it is a performance-critical component, as important as the GPU itself. A cluster connected by a slow or jittery fabric trains models slower than a smaller cluster connected by a fast one. Shainer has been making this argument since before it was fashionable, and the AI industry has arrived at his position.
NVIDIA's response is a portfolio that now spans two networking paradigms. Quantum InfiniBand - built specifically for synchronous, tightly-coupled HPC and AI training workloads - powers the majority of the TOP500 list and the largest AI training clusters. Spectrum-X, announced in 2023, is NVIDIA's answer to customers who require Ethernet for operational or economic reasons: it is the first Ethernet fabric designed specifically for AI workloads, with hardware-level congestion management and adaptive routing that bring InfiniBand-competitive performance to an Ethernet architecture.
In 2025, Shainer announced Spectrum-XGS - a cross-domain networking capability that connects physically separate data centers into a single logical AI fabric. The concept is a "virtual mega-datacenter": rather than forcing all workloads into one physical building, Spectrum-XGS allows operators to pool compute across sites, connected at 800Gb/s and treated by the software stack as a unified resource. The implications for the economics of AI infrastructure are significant.
NVIDIA Quantum InfiniBand
World's first 800Gb/s InfiniBand platform. Powers 271 of the TOP500 supercomputers. RDMA, adaptive routing, hardware-level congestion control.
NVIDIA Spectrum-X
The only Ethernet fabric designed specifically for AI training. Brings InfiniBand-competitive performance to organizations requiring standard Ethernet.
SHARP Protocol
Performs data reductions inside the network switch itself. Eliminates the all-reduce bottleneck in distributed AI training without touching the GPU.
UCX Framework
Unified Communication X. Hardware-native, ultra-low-weight communication layer supporting MPI, SHMEM, and AI frameworks. 2019 R&D 100 winner.
Spectrum-XGS
Cross-domain networking that connects distributed data centers into a unified AI super-factory. Announced 2025. The "virtual mega-datacenter."
BlueField DPU
Data Processing Units that offload networking, storage, and security from the main CPU/GPU path. Critical for performance at AI-cluster scale.
"The oldest generation will determine the performance of the newest generation."
- Gilad Shainer, on the risks of mixing hardware generations in an AI clusterAn Executive Who Still Publishes in IEEE Micro
Most technology company SVPs stop publishing academic papers around the time they stop writing code. Shainer has not. His name appears on research published in IEEE Micro, ISC High Performance, IEEE Hot Interconnects, EuroMPI, and Springer venues - a publication record that is unusual for someone managing a multi-billion-dollar product portfolio.
His most recent IEEE Micro paper, published in 2025, covers Unified Collective Communication - a unified library for CPU, GPU, and DPU collectives that addresses one of the central performance bottlenecks in distributed AI training. Earlier work on SHARP was published in Supercomputing Frontiers and Innovations in 2017. The co-design architecture for exascale systems appeared in a Springer journal in 2013. There are 19 papers in IEEE Xplore alone, spanning computer networks, hardware architecture, and information systems.
The through-line in his research is the same as the through-line in his products: how do you move data between thousands of processors fast enough that the processors never have to wait? The answers turn out to involve doing more computation inside the network itself, reducing the volume of data that ever has to leave the switch.
25 Years, One Throughline
Seven Things Worth Knowing
- During his M.Sc. at Technion, Shainer worked on an artificial pancreas biomedical project - closed-loop insulin delivery research that had nothing to do with network switches. The experience of designing low-latency feedback systems for biological processes has some resonance with designing low-latency feedback systems for distributed computing.
- He has won the R&D 100 Award - often called the "Oscars of Innovation" - twice, for two completely different technologies, four years apart. Most researchers consider one lifetime win notable.
- The HPC-AI Advisory Council, which he founded in 2008 with essentially no resources, now has 400+ member organizations. He did not spin it out of a corporate program. He built it as a community from scratch.
- NVIDIA's InfiniBand networking now connects more than half the world's 500 most powerful supercomputers - a market position built over two decades of incremental technical advantage.
- His title changes depending on context. Conference materials and press releases call him "SVP Networking." LinkedIn lists him as "SVP Marketing." Both are accurate. His scope spans deep technical strategy and go-to-market leadership simultaneously.
- He co-founded the ISC Student Cluster Competition over 13 years ago and remains personally involved in mentoring student HPC teams. The competition runs annually and has launched careers in HPC across multiple continents.
- Shainer graduated Cum Laude at both B.Sc. and M.Sc. from Technion - one of the world's most rigorous technical universities. His undergraduate and graduate work were both completed at the top of the class.
The Standards Bodies That Shape the Field
Shainer sits on or chairs multiple industry consortia that define how AI and HPC infrastructure is built at a standards level.
Gilad Shainer on Video