The voice on the other end of the line is fake. The breathing, the small laugh, the way it pronounces your last name on the second try - that part is Rime.
It is raining in Cleveland. A Domino's franchise is taking calls faster than its three teenagers behind the counter can answer. The phone rings, someone picks up, asks for extra cheese, gets the order wrong once, corrects themselves, laughs. The voice on the other end laughs back. The customer never asks if they are talking to a person. That ambiguity, that small unbothered transaction, is the product Rime sells.
Rime is a San Francisco voice AI lab that builds text-to-speech models for enterprise call centers. Its software handles more than 100 million phone conversations a month for brands you have already done business with. The company is 28 people. It raised a $5.5 million seed in May 2025. And it spent the previous three years arguing, quietly and stubbornly, that the way the rest of the industry was building synthetic voices was wrong.
Lily Clifford was supposed to finish a PhD in computational linguistics at Stanford. She did not. She dropped out because, in her telling, she wanted to hack on speech synthesis - specifically for customer support, the unglamorous part of voice where everybody had decided "good enough" was good enough. She co-founded Rime in 2022 with Brooke Larson, a PhD linguist who had worked on Amazon's Alexa, and Ares Geovanos, a Stanford engineer who had been around enough product launches to know which corners not to cut.
Sociolinguistics-trained Stanford dropout. Argues that real speech includes hesitation, and synthetic speech that pretends otherwise sounds, frankly, hostile.
PhD linguist who has already shipped voice at planetary scale. Brought the rigor that turns "vibes" into a model card.
Stanford engineer, product veteran. The person who makes sure the API responds before the customer notices it has not.
Rime ships two enterprise models. Mist v2 is the workhorse - deterministic, fast, predictable. You teach it how to pronounce "Worcestershire" through the API once, and it does not forget on call number eight million. Arcana v2 is the showpiece. Forty-plus voices across English, Spanish, French and German, trained on actual customer service interactions instead of polished audiobooks. It breathes. It laughs. It can code-switch mid-sentence without sounding like it tripped on the carpet.
Built for high-volume production where pronunciation accuracy must be guaranteed across millions of calls. ~225ms time-to-first-audio on Together AI dedicated endpoints. The model your CFO loves.
Expressive, conversational, multilingual. 40+ voices. On-prem available. Captures breathing, laughter, disfluencies - the small human noise that turns a transcript into a conversation.
Released in 2025. A practical speech model built on real-world conversational data. Rime gave it away. The strategic logic is interesting and we'll let you draw it.
Tuned for enterprise speed and concurrency. The model deployed when the answer to "expressive or fast?" is "yes."
A simplified picture of what makes Rime different from the dozen other TTS labs raising seed rounds this year. The pattern, in one chart:
FIG. 2 - YesPress estimate, based on public materials. Rime's brand is not built for TikTok demos. It is built for procurement.
Replace the menu maze with a voice agent that listens, confirms, and routes - in under 200 milliseconds.
The exact deployment Domino's and Wingstop are running. Take orders. Upsell. Handle the awkward pause.
Appointment confirmations, refill calls, post-discharge check-ins. HIPAA-compliant, on-prem option available.
Wrap Rime around any LLM. The model talks like a person; the agent thinks like a colleague.
English, Spanish, French, German - with code-switching. The voice that doesn't make customers translate themselves.
Build voices tuned to your brand. The pronunciation you define once is the pronunciation you ship forever.
Clifford, Larson and Geovanos start the company in San Francisco. The thesis: voice AI built for the call center, not the podcast.
The deterministic, low-latency TTS model that becomes Rime's enterprise workhorse.
Expressive voices for the consumer-grade demos; an open-source model for the developer community.
Unusual Ventures leads, with Founders You Should Know, Cadenza and a long list of operator angels.
Models go multilingual, ship on-prem, and land on Together AI dedicated endpoints.
The naming is not an accident. Mist is a type of rime. The company likes its metaphors load-bearing.
Most TTS models learn from audiobooks. Rime learned from real customer-service audio - the place where actual humans actually hesitate.
VentureBeat reported that Rime's TTS boosted sales 15% for major brands. The cheese-pull moment for voice AI.
Brooke Larson's PhD is in linguistics. Most voice AI startups would consider that a Wednesday lunch hire. Rime made her a co-founder.
Back to Cleveland. The franchise is still taking calls faster than three teenagers can answer. But now most of them are answered. The voice on the other end remembers how to say the customer's street name. It laughs at the right moment - a small, slightly nervous laugh, because it was trained on someone who was, once, slightly nervous. The order goes in. The pizza shows up. Nobody mentions the AI, because nobody noticed.
That is the product. Not the demo on Twitter. Not the celebrity voice clone. The thing you don't think about, working in the background, for 100 million conversations a month. Rime is not trying to make you marvel at voice AI. It is trying to make you forget you were listening to one.