A Cambridge spin-out built voice agents that handle the calls customers used to dread. Marriott, Caesars and PG&E now route their hold music through them.
— filed from approximately every contact center in North America.
It's 9:14 a.m. A guest at a Caesars hotel calls to change a reservation. Twenty years ago she would have heard a four-bar loop of stock jazz, then a tired voice reading a script. This morning, she hears something else: a voice agent that greets her by name, finds her booking, moves it to the next weekend, asks if she wants the same room category, and confirms by text - all in under ninety seconds. No transfer. No "let me put you on a brief hold."
The agent on the other end is built by PolyAI, a London- and San Francisco-based company that has spent the last nine years quietly insisting that voice was not, as everyone in Silicon Valley kept saying, dead. It was just badly served.
"The phone is still the front door for most enterprises. They've just been embarrassed about who's been answering it."— a fair summary of PolyAI's pitch deck, paraphrased.
Walk into any large enterprise's contact center in 2017 and you'd find the same scene. Banks of headsets. Quarterly attrition that ran past 30%. A handful of IVR menus that no one had touched since Obama's second term. And underneath all of it, a budget line nobody in the C-suite wanted to read out loud.
The industry's response was to throw bodies at it. PolyAI's founders watched and concluded the bodies weren't really the problem. The conversations were. Most customer calls, they argued, are not bespoke crises. They are bookings, balances, billing, returns, password resets - the same eight questions in a thousand grammatical disguises.
"You don't need a human to read a confirmation number aloud. You need a human when something has actually gone wrong."— the unspoken thesis behind every PolyAI deployment.
Earlier voice bots had failed at this not because the idea was wrong but because the bots were wooden. They couldn't handle interruptions, accents, half-sentences, the way real people actually talk to strangers on the phone. The market quietly concluded voice AI was a punchline. PolyAI concluded the market had given up too early.
Nikola Mrkšić, Tsung-Hsien (Shawn) Wen and Pei-Hao (Eddy) Su met inside Cambridge's Dialogue Systems Group, the lab where Steve Young - one of the godfathers of statistical speech recognition - had been running experiments for decades. They wrote their dissertations on the same narrow, unglamorous question: how do you make a machine sound like it understood what you just said?
The kind of trio whose group chat is, statistically, mostly arXiv links.
In 2017 they incorporated PolyAI. The name was a nod to a then-unfashionable bet: build one architecture that could speak many languages, on many channels, for many industries. Most AI startups at the time wanted to own a vertical. PolyAI wanted to own a layer.
"Everyone else was building a feature. We thought we were building a switchboard."— roughly, the founding bet.
Notice the pattern. Each round arrived after a wave of larger competitors had declared the problem solved, then quietly retreated from it. Voice in production is, it turns out, harder than voice in demos.
PolyAI's stack does three jobs that, in isolation, sound straightforward and, together, are not. It listens (speech recognition tuned for noisy phone lines). It thinks (a dialogue policy that decides whether to answer, ask, transfer, or quietly look something up). It speaks (a voice that doesn't telegraph "I am a robot" within the first three syllables).
Then there is Agent Studio, launched in April 2025. It is a control panel - a place where someone who runs customer experience at a chain of restaurants, but who does not write Python, can adjust an agent's tone, add brand vocabulary, set guardrails, and watch what happens. It is also the part of the product that, internally, made PolyAI legible to procurement teams who had previously been baffled by AI vendors.
"The big unlock wasn't the model. It was giving non-engineers something to click."— the quiet truth of most enterprise AI in 2026.
Each bar is a year someone in the boardroom said "voice AI? again?" - and lost.
The customer list reads like an airport magazine: Marriott Hotels, Caesars Entertainment, PG&E, Volkswagen, FedEx, Unicredit, Hopper, OpenTable. They are the kinds of brands that get press for downtime and operational misses, which is why most of them speak about PolyAI carefully, or not at all. The deployments tend to surface only when someone notices that the wait time has, inexplicably, vanished.
Partial list. The other names mostly prefer their NDAs.
Behind the logos are integrations - the unglamorous plumbing. PolyAI ships with connectors for Twilio, Genesys, Five9, Amazon Connect, Mitel and Microsoft Azure, which is the kind of detail that wins enterprise contracts and bores everyone else. NVIDIA's investment in the December round is widely read as a bet on the inference economics underneath all of it.
Phrased less grandly: PolyAI's bet is that the phone is not a legacy channel. It is the channel customers reach for when something matters, when typing won't do, when the chatbot has already failed them. The company exists, in its own framing, to make sure that the voice on the other end is patient, multilingual, awake at 3 a.m., and - critically - actually able to help.
"The goal isn't to replace humans. It's to stop asking humans to do the parts of the job that are machine-shaped."— a defensible reading of the PolyAI worldview.
It's an oddly humanist position for an AI company. It also happens to be the one that closes enterprise deals.
Back to the Caesars guest, still on her Tuesday morning call. She has hung up by now. She has gone back to her coffee. She will probably never think about the interaction again - which, if you've spent any time in this industry, is the highest compliment a contact-center conversation can earn. The forgettable call is the working call.
That is what PolyAI is selling. Not magic. Not "agents." Not a future in which everything is conversational and frictionless. Just a phone line that works the way customers always assumed phone lines were supposed to work. Quietly. Quickly. In the language they speak. At the hour they happen to be awake.
"In 2017 voice AI was a punchline. By 2026 it had become a procurement line item. PolyAI is most of the reason."— filed under: the things you don't notice until you do.
The next chapter is already half-written. NVIDIA is on the cap table for a reason. Agent Studio is being pitched to operators who used to buy headsets and headcount, and are now buying control panels. Half of all calls. Eighteen languages. One hundred-plus brands. The numbers are starting to compound, and PolyAI - patient, multilingual, awake at 3 a.m. - is still picking up.