A New York startup teaching machines to take your order in the loudest room in retail - the drive-thru.
THE SPEAKER BOX. An idling engine, a kid in the back seat, rain on the roof - the one interview room where nobody sits still and everybody talks at once. Incept built its whole company for this exact five feet of asphalt.
Here is a fact that sounds made up but isn't: the hard part of drive-thru AI was never the artificial intelligence. It was the microphone. For years, everyone kept announcing that voice AI was basically solved, and then you would pull up to a speaker box with a diesel engine two feet away and discover that it was not, in fact, solved.
Incept AI's entire business is a spread. Most AI order-takers, by the company's telling, get you to about 83% accuracy and then tap out - handing roughly 2 of every 10 cars to a human. That handoff is where the economics go sideways. A person has to stop what they are doing, put on a headset, and untangle an order the machine gave up on. Multiply that by every lane in every store, all day.
Incept says its system completes 95%+ of orders (the company has cited 97%+ in its own materials) without anyone stepping in. If that number holds at scale, the difference between 83 and 95 is not a rounding error. It is the difference between a novelty at the speaker box and a machine you can actually staff around.
The reason the company can chase that gap is a boring, unglamorous insight: the bottleneck is audio. Engine hum, acoustic echo, two people talking at once, wind across the mic. So instead of fine-tuning yet another language model, Incept built a neural audio engine to clean the sound before a foundation model ever tries to understand the words.
It is a very specific bet - that owning the messy physical layer is more durable than owning the model on top of it. The models, after all, are increasingly commodities. The noise is forever.
Figures per Incept AI and industry commentary; independent, audited benchmarks not published.
"There are times I'm amazed the AI can hear what the guest is saying."
CEO and co-founder Umut Isik is an audio scientist by training. Before Incept he was an applied scientist at Amazon Web Services building software to fight background noise and echo - and, in an earlier chapter, he worked on drive-thru audio for McDonald's during an RFP that fed into Apprente (later acquired by McDonald's, then spun into IBM). When someone keeps orbiting the same hard problem across three companies, that is usually a signal. He went and built the company to finish it.
He is joined by co-founder and Chief Revenue Officer Justin Foster, a veteran of quick-service restaurant technology and voice - including time at Presto Automation - who knows the buyers, the headsets, and the operational reality of a lunch rush. The pairing is deliberate: one founder who understands the sound, one who understands the store.
Audio scientist, ex-AWS applied scientist. Spent years on drive-thru noise before founding Incept.
QSR and voice-tech veteran (incl. Presto Automation). Runs go-to-market and restaurant relationships.
Former Google CIO and Rally Ventures partner; joined Incept's board with the pre-seed round.
Incept's stack starts with the Incept Neural Engine - proprietary audio neural networks that strip out background noise, acoustic echo and crosstalk. Only then does cleaned speech get routed to foundation models (the company works with GPT, Gemini and DeepSeek rather than betting on a single one), which handle the conversation, the complex modifiers, and the upsell. The result plugs into the POS and headset systems restaurants already run.
Takes the order at the speaker box - natural conversation, complex modifiers, heavy noise - without transferring to staff.
Answers the restaurant phone and handles orders and FAQs over low-quality audio lines.
Audio networks that suppress noise, echo and crosstalk before any language model sees the words.
Dashboards on guest sentiment, order patterns and per-location performance from every interaction.
Runs limited-time offers and automated upsell prompts through the voice AI.
Routes to GPT, Gemini or DeepSeek - the LLM is swappable; the audio layer is the moat.
"AI voice agents have handled restaurant orders for years, but no provider has achieved 97%+ accuracy without human intervention."
Incept sells B2B to quick-service and fast-casual chains. The adoption trick is not the AI - it is the switching cost, so Incept integrates with Toast, Square, PAR and HME rather than asking operators to rip anything out. It works: a team of roughly four people landed a late-stage pilot with a restaurant chain of 1,000+ locations (reported elsewhere as 500+) within about a year of founding. Investors noticed. In February 2025 the company closed a $3M pre-seed led by Rally Ventures with participation from 10VC.
Incept AI founded in New York by Umut Isik and Justin Foster.
System launches and begins reaching 95%+ order completion in live restaurant environments.
Announces $3M pre-seed led by Rally Ventures with 10VC; former Google CIO Ben Fried joins the board.
Featured by Food On Demand and Hospitality Technology for its neural-audio approach to drive-thru noise.