Lily Clifford

Profile

The Linguist Who Rewired the Phone Call

When you dial Domino's to place an order and the voice on the other end sounds oddly... normal - not robotic, not aggressively cheerful, just present - there's a decent chance Lily Clifford built that voice. She is CEO and co-founder of Rime, a San Francisco company that has quietly become the invisible voice layer for some of the busiest phone lines in America. Over 100 million conversations a month. Restaurant ordering, healthcare triage, telecom support, enterprise IVR. All of it running on models trained in a recording studio Lily's team built from scratch, filled with people talking over each other the way people actually do.

Lily came to this with a background most tech founders don't have: a deep, genuine obsession with how humans speak differently depending on where they're from, who they're talking to, and what they're trying to get away with socially. That's sociophonetics - the study of speech as a social act. It was her PhD research at Stanford. She left before finishing. Not because she failed at academia, but because she found something more urgent to build.

I ended up dropping out because I wanted to hack on speech synthesis models - specifically for customer support.

- Lily Clifford, CEO, Rime

The thesis that launched Rime is counterintuitive enough that most product teams would have killed it in a committee: a slightly bored AI voice outperforms a peppy, polished one. In real enterprise deployments, voices that sounded genuinely human - with the occasional flat affect, the slight hesitation before a proper noun, the breath before a long sentence - converted better, contained more calls, and got fewer hang-ups than the over-articulated, studio-perfect alternatives everyone else was shipping.

Origin

Three Co-Founders, One Recording Studio, and a Bet on Messy Speech

Lily graduated from Pitzer College in 2014 with a linguistics degree. She spent the next several years moving deeper into the academic study of speech - eventually landing at Stanford's linguistics PhD program, where she was researching how social and demographic factors shape the way people speak. People in Texas and California don't just have different accents; they have different speech rhythms, intonation patterns, and expectations about what an authoritative voice sounds like. She was studying all of that when it clicked: AI voices were failing not because the models were bad, but because the data was wrong.

Training on audiobook recordings - the industry standard - produced voices that sounded like someone reading aloud. Deliberate. Articulate. Fundamentally unlike a conversation. Lily's research instinct said the fix was obvious: get real conversational data. Full-duplex speech. People interrupting each other. Laughing mid-sentence. Trailing off. Nobody in the market was building that dataset because it was hard and expensive and the industry had convinced itself audiobooks were good enough.

Co-Founder #2

Brooke Larson - PhD Linguist, ex-Amazon Alexa

Co-Founder #3

Ares Geovanos - Stanford Engineer, ex-UCSF Brain-Computer Interfaces

In 2022, Lily cold-recruited two people with credentials that made no obvious sense together: Brooke Larson, a PhD linguist who had been building voice for Amazon's Alexa, and Ares Geovanos, a Stanford-trained engineer who had been working on brain-computer interfaces at UCSF - technology to help people who had lost the ability to speak. She pitched them on a company built not on the premise that AI voices should be perfect, but that they should be real.

They rented space in San Francisco, built a recording studio, and started capturing spontaneous, full-duplex conversations - the kind where people talk at the same time, lose their train of thought, and laugh in the wrong places. That dataset became the foundation for everything Rime has built since.

Technology

Linguistics as Infrastructure

Most voice AI companies talk about latency and naturalness as if they're in tension. Rime treats them as an integrated problem with a linguistic solution. The models are trained to understand not just what a word is but what a word does in context - how a list sounds different from a question, how regional pronunciation shifts when someone is tired, why the same sentence lands differently in an IVR versus a customer support chat.

10%
Improvement in IVR call containment - from 85% to 95% - just by making the bot sound more human. For enterprises with millions of inbound calls per year, that delta is worth millions of dollars.

Lily calls the approach "linguistics as a service" - making it easy for developers to adjust pronunciation and speech patterns without needing to know the International Phonetic Alphabet. And "demographics as a service" - because enterprise customers shouldn't need to become voice casting directors to get a voice that sounds right for their audience. People hear demographic cues in voices unconsciously, and they respond to them. A Texas-based customer service operation and a New York-based financial services firm need different things from an AI voice, even if neither of them can articulate exactly why.

Rime's Product Arsenal

Mist - Next-gen TTS model family trained on massive conversational speech dataset; powers real-time voice applications
Arcana - Expressive AI voice model; designed for emotional range and nuanced delivery
Mist v2 - Fastest customizable TTS with sub-200ms end-to-end latency
On-premises deployment - Only next-gen voice AI available on-prem; critical for healthcare & finance
200+ distinct voices - Including demographically specific voices and custom cloning via API

The technical differentiation shows up in the compliance stack too. Rime achieved SOC 2 Type II certification and HIPAA compliance, and it's the only next-generation TTS provider offering on-premises deployment. For regulated industries - healthcare, finance, government - that's not a feature. It's the only way they can legally deploy voice AI at all.

Founder Story

The Thing Nobody Warned Her About

Lily will tell you she came in as a researcher and a builder. She built the models. She hired the linguists. She ran the recording sessions. What she didn't expect was that the hardest part of building a voice AI company wouldn't be the voice AI - it would be sales.

One of the best lessons I've learned as a first-time founder is that there's no substitute for just diving in and doing the hard work when learning something new. I'm talking about sales.

- Lily Clifford, X (formerly Twitter)

In the early days at Rime, the instinct was to build the best model and collect the best data and let the technology speak for itself. The model would find customers. The customers didn't show up on their own. Lily spent months calling enterprise prospects cold, learning what questions they actually had about voice AI deployment, learning how to talk about millisecond latency in terms of call containment rates and revenue per answered call. The scientific precision of her research background translated, eventually, into a very specific sales methodology: show the data, name the number, anchor on the customer's actual problem.

Twenty-plus enterprise clients later - including chains that together handle tens of millions of food orders annually - the methodology is working. The company hit $1.1M in revenue with a 10-person team in 2025. It was a lean, technical team that figured out the commercial side by necessity, not by plan.

In Her Own Words

Lily on Voice, Startups, and Getting It Right

People in Texas sound different from people in California, and we all pick up on these cues, consciously or not.

Voice AI should be as rich, diverse, and expressive as the people it serves.

We wanted voices that sounded like a friend, not a voice actor.

You can increase your IVR call containment rate from 85% to 95% just by making the bot sound better.

What's Next

The Global Accent Problem and the Speech-to-Speech Future

The demand Lily is most focused on right now is India. The linguistic diversity of the subcontinent - hundreds of languages and thousands of dialects, each with its own phonology, its own intonation rules, its own sociolinguistic expectations - represents exactly the kind of problem Rime's linguistic-first approach is built for. Getting an AI voice right for Tamil-speaking customers in Chennai is a fundamentally different challenge than getting it right for English-speaking customers in Atlanta, and most voice AI companies are treating both problems as if they're the same thing. Rime isn't.

Further out, Lily's vision for the technology goes past text-to-speech entirely. The current paradigm - convert speech to text, process text, convert text back to speech - introduces latency and loses prosodic information at every step. The next phase, she believes, is direct speech-to-speech: models that process the acoustic signal directly, preserving the full richness of how something was said rather than just what was said. That's not a 2025 product. But it's the research direction.

Rime Funding History

Pre-seed / Angels$3.1M

Seed Round (May 2025) — Led by Unusual Ventures$5.5M

Total Funding$8.6M

For now, the trajectory is straightforward: more enterprise customers, more languages, deeper integration into the contact center stack. Rime is hosting industry events in San Francisco. It is building in public. It is doing the work of convincing enterprise software buyers - some of the most skeptical, contract-heavy, procurement-dependent buyers in technology - that voice AI is ready for their most sensitive customer interactions. Given who's already bought in, the argument is getting easier to make.

Timeline

A Career Built on How We Sound

2014

Graduated from Pitzer College with a BA in Linguistics

2018

Enrolled in Stanford University's PhD program in Computational Linguistics; began research in sociophonetics

2018-2022

Researched how social and demographic factors shape speech patterns; identified critical gaps in commercial TTS training data

2022

Left Stanford PhD program; co-founded Rime with Brooke Larson and Ares Geovanos; built an in-house recording studio in San Francisco

2023

Launched Mist TTS model family; secured early enterprise customers in food service and healthcare

2024

Scaled to tens of millions of monthly phone conversations; launched Arcana and Mist v2; achieved SOC 2 Type II and HIPAA compliance

May 2025

Announced $5.5M seed funding led by Unusual Ventures; Rime powers 100M+ monthly conversations for 20+ enterprise clients including Domino's and Wingstop

LilyClifford