The quiet machine room behind the enterprise Kubernetes, GPU and AI platforms you actually use. Built so your platform team can stop saying no.
It is 2:14 a.m. somewhere in a bank. A data scientist hits "deploy" on a model that needs eight GPUs in three regions. Nothing breaks. Nobody pages anyone. The cluster spins up, the budget gets checked, the access policy gets stamped, the workload lands. That quiet, where there used to be a Slack thread on fire, is the product Rafay sells.
Rafay Systems is a 160-person enterprise software company in Sunnyvale that builds infrastructure orchestration and workflow automation for Kubernetes, GPUs and AI workloads. Customers do not buy "Kubernetes" from Rafay. They buy the boring, expensive part: governance, lifecycle, access, cost, security and self-service - the small layer that makes every other layer behave.
For a while, Kubernetes was treated like artisanal sourdough. Every team had its own starter, its own oven, its own opinions. The good ones produced excellent software. The rest produced incidents. Both produced a lot of YAML.
Enterprises eventually noticed they were paying a small army of senior engineers to hand-build something that was supposed to be a commodity. Worse: every cluster was a snowflake, every cloud was a dialect, every audit was a panic. AI made it more urgent. GPUs are expensive, scarce and somehow always idle in the wrong region.
The Rafay bet was simple, and slightly unfashionable for a 2017 startup: nobody actually wants to "do" Kubernetes. They want a platform.
Rafay was co-founded in 2017 by Haseeb Budhani, who had just sold his last company, Soha Systems, to Akamai, and Hanumantha (Hemanth) Kavuluru, who runs engineering. They started in adjacent territory - edge computing, low latency, 5G-flavored dynamic compute - which sounds dated until you remember that "AI inference at the edge" is the same problem with better PR.
The shift to a pure platform-engineering play was a quiet one. Less of a pivot, more of an admission: every enterprise customer was asking the same question. How do we let our developers consume Kubernetes without becoming Kubernetes experts? Rafay decided to answer it for a living.
Rafay sells what it calls a "platform-as-a-service for platform teams." That is a mouthful, so here is the shorter version: it is the control layer your developers never see and your auditors love.
The flagship. Orchestrates Kubernetes, GPUs, AI workloads across AWS, Azure, GCP and on-prem.
Lifecycle, governance, zero-trust access and policy at enterprise scale. The boring half that pays the bills.
Self-service GPUs for data scientists. Multi-tenant. Metered. Auditable. Available on AWS Marketplace.
Templated environments platform teams publish so developers stop filing tickets.
Zero-trust Kubernetes access control. Donated by Rafay to the CNCF Sandbox.
GPU and cluster-level cost tracking, plus the audit trail your CISO has been politely demanding.
Rafay does not run a flashy public roadshow. It runs in the back of regulated industries - banks, healthcare, telcos - where "demo magic" is a liability and the thing that matters is the audit log. The growth shows up in clusters under management more than in T-shirts at conferences.
Public references include MoneyGram, Guardant Health and S&P Global. Partner work runs through Accenture's AI Refinery, NVIDIA GPU programs and an AWS Strategic Collaboration Agreement. None of those names sign with a startup unless the demos survived a procurement committee with a grudge.
1. The company's name eventually ate the product's name. The "Kubernetes Operations Platform" quietly became "the Rafay Platform." Rare, and slightly Wildean.
2. Paralus, Rafay's open-source zero-trust tool, was given away to the CNCF. A startup voluntarily handing the keys to a foundation is the kind of move VCs ask polite questions about.
3. The early pitch involved 5G and edge nodes. The current pitch involves GPUs and AI. Same problem - somebody else's compute, somebody else's region, your audit log - different decade.
If you ask Rafay what it does, the answer is a noun: a platform. If you ask it what it is for, the answer is a sentence: make modern infrastructure - Kubernetes, GPUs, AI runtimes - feel like a product inside the enterprise, not a science project.
That sentence has consequences. It means Rafay does not sell to developers; it sells to the team that gets paged when developers cannot ship. It means features get judged less on novelty and more on whether a Friday-night oncall engineer would forgive them. It means the company has spent more time on policy engines than on logos.
Enterprises are about to spend more on GPUs than they once spent on data centers. The H100s and B200s are scarce, expensive, and weirdly placed - some on-prem, some in clouds, some in colos in Reykjavik. Every CFO is going to ask the same question: are we actually using these? Most companies will not be able to answer.
Rafay's wager is that the answer to "are we using these" is the same answer as "can our developers consume them safely" - and that the layer between humans and silicon is worth being a real, paid product. The next five years of AI infrastructure spend will be a referendum on whether that wager pays.
Return to the opening scene. The data scientist did not file a ticket. The platform engineer did not get paged. The cluster came up, the policy held, the budget logged. Somewhere a CISO got a clean audit trail. Somewhere a CFO saw a GPU that finally ran a workload instead of warming a room.
That is what Rafay sells. Not Kubernetes. Not AI. Quiet.