Deepgram

The company you are already talking to.

A drive-thru in Ohio is taking your order. The voice is calm, a little upbeat, faintly recognizable as not-quite-human. It hears you mumble. It catches you mid-sentence when you change your mind from a milkshake to a malt. It does not, crucially, sound like a robot reading a script in 1998.

Somewhere in a data center, Deepgram is doing the heavy lifting. Speech in. Meaning out. Speech back. Repeat, 80 million times a day, across customer service, healthcare intake, meeting notes, dispatch lines, and a creeping number of consumer apps that have quietly grown ears.

The company is now eleven years old, a fresh unicorn at $1.3 billion, and increasingly the answer to a question a lot of CTOs are now asking out loud: who is going to power the voice layer of AI? Deepgram would like that to be Deepgram.

Voice is the next great interface. The companies that own the models will own the economy on top of them. — Scott Stephenson, CEO & Co-founder

The problem they saw.

For most of the 2010s, speech recognition was a feature you tolerated rather than a product you loved. Cloud providers bolted aging acoustic models onto cleaner APIs and called it modern. Accuracy was decent for American English in a quiet room. Anywhere else - a call center floor, an accent the training set didn't love, three people talking over each other in a Zoom - it broke politely.

The market had agreed, quietly, that this was fine. Deepgram disagreed, loudly.

The pitch was almost stubbornly simple: train an end-to-end deep learning system on enormous quantities of audio, ship it as an API, and beat the incumbents on accuracy, latency, and price. Not one of those three. All three.

Fig. 1 — the legacy stack, c. 2015, asked nicely if it would please scale to a billion phone calls. It did not.

The founders' bet.

Three physicists walk into a startup. It is not the setup of a joke, although it should be.

Scott Stephenson, Adam Sypniewski, and Noah Shutty met at the University of Michigan, where Stephenson did a PhD in particle physics. His thesis work involved building a lab two miles underground - the kind of place where you go to detect particles that have, statistically, almost no interest in being detected. Dark matter. The universe's most reluctant subject.

Audio, by comparison, is generous. It is everywhere. It is messy. It is exactly the kind of signal problem a former dark matter physicist looks at and thinks, this is downhill.

All three Deepgram co-founders trained as physicists. The company's voice models are named Nova, Aura, and Flux. The lineage is not subtle. — YesPress editors

They went through Y Combinator in 2016 with a thesis the market thought was already settled - and the next ten years of accuracy benchmarks suggested otherwise. YC W16 end-to-end

What they ship.

Deepgram's product surface is, on paper, three letters plus a couple of greek-looking names. In practice it is the plumbing for any company that wants to put a microphone in front of an AI and get a useful answer back.

STT

Nova-3

Real-time speech-to-text, including a Nova-3 Medical variant trained on clinical vocabulary. Marketed as the most accurate real-time STT model on the market.

TTS

Aura-2

Low-latency, production-grade text-to-speech. Sounds like a person who has had coffee but not too much.

Conversation

Flux

The voice-agent model that solved the most awkward problem in synthetic conversation: knowing when, and how, to be interrupted.

Agent API

Voice Agent API

One endpoint, voice in to voice out. STT, LLM reasoning, and TTS stitched into a single call so developers do not have to.

Understanding

Saga

Audio intelligence layer for summarization, intent and sentiment - the bits that turn a transcript into a decision.

Fig. 2 — five products, one premise: the call should sound less like 2014.

What is unusual about all of this, in 2026, is that Deepgram trains its models end to end rather than fine-tuning somebody else's open checkpoints. That choice was contrarian in 2015 and is increasingly contrarian again now that everyone is wrapping somebody else's API. It is also why the company can charge a fraction of incumbent prices and still operate a real business.

The arc, in nine entries.

2015

Three University of Michigan physicists found Deepgram. The plan: end-to-end deep learning for speech.

2016

Y Combinator winter batch. Seed round closes.

2018

Series A led by Wing VC. The enterprise pipeline starts to look like a real business.

2021

Series B - $72M, led by Tiger Global and Madrona. Voice AI quietly becomes a category.

2022

$47M Series B extension. Hiring binge.

2024

Voice Agent API launches with a public drive-thru demo. Engineers everywhere take notice.

2025

Nova-3 Medical ships via AWS Bedrock. Healthcare voice gets a real provider.

Jan 2026

$130M Series C at $1.3B valuation, led by AVP. Twilio, ServiceNow, SAP and Citi Ventures join.

Jan 2026

Acquires OfOne to push voice agents deeper into drive-thru and physical-world deployments.

The proof.

It is one thing to claim better accuracy. It is another to win the kinds of customers who measure that claim with stopwatches and statisticians.

NASASpotifyTwilioCitibank ServiceNowSAPGranolaGenesys

The 1,300-customer logo wall does the heavy lifting in pitch decks. The Series C, though, did something more telling: it brought Twilio, ServiceNow, SAP, and Citi Ventures onto the cap table as strategic investors. That is what it looks like when the companies that would normally build voice AI in-house decide it is cheaper, faster, and more accurate to just route the audio to Austin.

Deepgram in numbers

// figures self-reported and from press, May 2026

Total raised

$245M+

Series C

$130M

Valuation

$1.3B

Customers

1,300+

Team

~210

Founded

2015

Fig. 3 — the receipts. Bar lengths are relative within this chart, not across the universe of voice AI funding (which would require a wider page).

The companies that would normally build voice AI in-house are now writing checks to Deepgram instead. That is a tell. — YesPress editors

Why they say they are doing this.

Mission statements in voice AI tend toward the grand and the unfalsifiable. Deepgram's is mercifully concrete: make every voice heard and understood by machines. Translation: a janitor in Manila and a cardiologist in Boston should get the same accuracy on the same hardware at the same price.

That phrasing is not entirely altruistic. The companies that can transcribe and respond to every voice - not just the ones that show up in the training data - capture the rest of the market. Doing the right thing happens to be the most defensible thing. The Venn diagram is two circles.

The culture

The company runs lean for its scale. Roughly 210 people, hubs in San Francisco and Austin, a remote-friendly engineering culture descended from research labs. Internal tooling skews toward research dashboards rather than product analytics. Notion, Linear, GitHub, Slack. The model training infrastructure is genuinely impressive and almost nobody outside the company gets to see it.

Why it matters tomorrow.

Here is the bet, in case you would like to take the other side of it.

Every consumer app eventually becomes a conversation. Every enterprise workflow that touches a human eventually has a voice agent in it. Every meeting becomes a searchable transcript. Every call center hires fewer humans and more APIs. The platform that owns the underlying voice models gets to be a kind of toll booth on that traffic.

OpenAI, Google, Microsoft, ElevenLabs, AssemblyAI - the field is crowded and well-funded. Deepgram's argument is the only one a serious infrastructure company can make: we own the models, we serve them efficiently, we are not anyone's feature.

Speech-to-text used to be a feature. Deepgram is making it an economy. — YesPress editors

The drive-thru, revisited.

Back to the drive-thru. The line moves faster than it used to. The order takes fewer corrections. The person inside the speaker - if that is still the right word - hears your accent without flinching and politely waits while a toddler in the backseat changes her mind about cheese.

Eleven years ago, this conversation did not happen. Not because nobody wanted it to. Because the technology to do it well did not exist, and the incumbents had quietly agreed not to build it.

Three physicists thought that was a strange thing to agree about. They built it instead.

— END OF FILE —

Find them.

Websitedeepgram.com LinkedIn/company/deepgram Twitter / X@DeepgramAPI GitHubgithub.com/deepgram YouTube@Deepgram Facebook/deepgram Discorddiscord.gg/deepgram Y Combinator/companies/deepgram

Watch & listen

DemoVoice Agent API - drive-thru demo DemoDeepgram + AWS - healthcare voice TutorialVoice Agent software walkthrough InterviewAIMinds with Scott Stephenson InterviewRedMonk Conversation - Stephenson PressSeries C announcement

Deepgram.