The OpenAI-backed AI tutor that has 10 million people talking to their phones - and starting to understand what comes back.
It is a Tuesday night somewhere in Seoul, and a 34-year-old marketing manager is arguing with her phone about the merits of a hotel breakfast. She is doing it in English. Nobody is grading her. Nobody is watching. The phone, patient as a saint, keeps asking follow-up questions.
This is what Speak looks like in the wild. Not a classroom. Not a textbook. Not a streak counter blinking guilt at you from a lock screen. A conversation, in your own time, with an AI that does not get bored. Speak's bet, more or less from day one, has been that this is what language learning actually is - and that almost no other app was treating it that way.
In December 2024, that bet earned the San Francisco company a $1 billion valuation and a $78 million Series C led by Accel, with OpenAI's Startup Fund, Khosla Ventures and Y Combinator coming back for more. The company has raised about $162 million in total. It is now, formally, a unicorn. Which is a fun word for what is really a much stranger thing: a fluency machine.
For most of the world, the problem with learning a language has never been finding lessons. It has been finding someone to talk to. The internet is awash in grammar drills, vocabulary apps, and animated owls dispensing tough love. Speaking, the actual point of the exercise, is the part everyone quietly skips. It is the part that is embarrassing.
This is especially brutal in markets where English is a career-defining skill but native speakers are rare and expensive. South Korea. Japan. Taiwan. Places where adults spend years on flashcards and emerge unable to order coffee with confidence. Speak's earliest growth was not in San Francisco. It was on the other side of the Pacific, where users described the app, in app store reviews, as the thing that finally let them practice without flinching.
Speak co-founders Connor Zwick and Andrew Hsu noticed something else, too. The act of speaking to a person is not just useful, it is psychologically loaded. People freeze. They go quiet. They self-edit. A machine that listens without judging removes a category of friction that humans, unfortunately, can never quite shake.
Connor Zwick sold his first company, the Coco Controller, while still at Harvard. Andrew Hsu started college at twelve. They met through Peter Thiel's Thiel Fellowship - which pays smart young people to skip school. They built Speak the way you'd expect: long, weird, and from the speech engine up.
Speak teaches you a language by making you speak it. Lessons are short, scenario-driven, and built around the assumption that you will respond out loud. There is a structured curriculum for beginners through advanced learners, and there is an open-ended AI tutor that will roleplay anything: a job interview, a doctor's visit, a fight about whose turn it is to take out the trash.
The speech recognition is the thing. Most language apps that promise speaking practice grade you, kindly, on whether you got close enough. Speak listens the way a tutor listens. It catches pronunciation drift. It notices when you use a tense that technically works but no one would actually say. Then it gives you the line a native speaker would have used, and asks you to try again.
Scenario-based units that build from basic exchanges to nuanced conversation, with instant speech feedback after every line.
Open-ended chat. Pick a topic - or let it pick one. Speak roleplays the other side of the conversation and corrects on the fly.
Enterprise programs for companies and schools that want measurable speaking outcomes, not seat-time.
Speak has been downloaded more than ten million times. The average session is somewhere between ten and twenty minutes, which is, not coincidentally, the kind of number behavior researchers will tell you is the sweet spot for habit formation. The price - around twenty dollars a month or ninety-nine a year - is the part the company stopped apologizing for once it had data on retention.
Source: company announcements and Crunchbase. The Series C nearly tripled Speak's total raised in a single sitting - and doubled its valuation in six months.
The customers are not only consumers. Speak's enterprise arm sells into companies that need their employees to operate in English - a quiet, large, very boring market that becomes interesting when you remember it pays in five-year contracts. Schools and universities are next. The pitch is uncomplicated: students will use this. The hard part of language education is not pedagogy. It is engagement.
OpenAI's Startup Fund backed Speak as its first investment in 2022 and led the Series B. Since then, Speak has been an early partner on new speech models - Whisper, the realtime API. The relationship is closer than most. Speak is one of the only consumer companies OpenAI has placed a meaningful, recurring bet on.
Zwick has been remarkably consistent on this. The goal is not to be a better Duolingo. The goal is to build the best way to get to actual spoken fluency in any language, available to anyone with a phone. The reason this is newly possible is that AI tutors no longer have to be scripted. They can listen, adapt, correct, and have a real conversation. That changes the unit economics of education in ways that the legacy textbook industry is still pretending not to notice.
Speak's bigger ambition - the one tucked behind the product roadmap - is to be the AI tutor people grow up with. Start with English. Expand to the rest. Make the tutor good enough that the question of whether you can afford one becomes, eventually, embarrassing to ask.
Education has always been gated by access to good teachers. The brutal truth of language learning, in particular, is that one-on-one tutoring works and nothing else really does - and one-on-one tutoring is unaffordable for almost everyone on earth. Speak is the most credible attempt yet at the obvious workaround: a tutor that scales because it is software, but that listens because it is software built around speech.
Whether this is good for the world depends on whether you think more humans speaking to each other across more languages is good for the world. The investors, evidently, have made up their minds. The 10 million people on the app, in 30 countries, talking out loud on the bus, have made up theirs.
Back in Seoul, the marketing manager finishes her conversation about hotel breakfasts. The app suggests she try a more idiomatic phrase next time - the one a native speaker would have used. She rolls her eyes, taps the microphone again, and tries it. Nobody is grading her. Nobody is watching. She is, in the most literal possible sense, learning to speak.