"Turn any image into a FaceTime call."
A frontier lab building expressive, from-scratch talking avatars - the visual face for every voice agent, generated live and personalized to whoever is watching.
The pitch, in one frame. Four faces - a redhead, a bow-tied teddy bear, an anime swordsman, and a woman mid-call - each one born from a single still image. Look at the woman on the right: the message box, the mute button, the plant behind her. It is not a video you watch. It is a video that watches back.
Here is a fact about AI avatars that almost nobody in the industry will say out loud: most of them are bad. Stiff. Uncanny. A mouth flapping over a frozen face, like a puppet operated by someone who has never seen a human before.
LemonSlice's co-founder, Lina Colucci, said it out loud. Existing avatar solutions, in her words, "add negative value - they are creepy and stiff." This is an unusual thing for a founder to say, because the standard move is to insist your category is enormous and only getting better. Colucci's move is more interesting: she agrees the category is broken, and then she explains that LemonSlice is not really in it.
The distinction is technical, and it matters. Most avatar tools take a photo and slide a lip-sync layer over the top - the picture stays put, the mouth moves. LemonSlice's model, Lemon Slice-2, generates every pixel from scratch. It is a large-scale video diffusion transformer, roughly 20 billion parameters, in the same family as OpenAI's Sora and Google's Veo3. The difference is specialization: Sora makes cinematic clips of anything; Lemon Slice-2 makes talking humans, in real time.
Real time is the whole game. A pre-rendered avatar video is a finished object - you make it once, you watch it forever. LemonSlice is chasing something stranger and harder: video that is generated on the fly, at conversational speed, personalized to whoever happens to be watching. Feed it a single image and it produces something like a FaceTime call with a person who does not exist, streaming at around 20 frames per second on a single GPU, complete with facial emotion, hand gestures, and whole-body movement.
And it does not have to be a person. The demos include a photorealistic woman, an anime swordsman with red eyes, and a teddy bear in a bow tie. The model animates a cartoon as readily as a human, which is the sort of detail that sounds like a gimmick until you realize it is a product roadmap - every brand mascot, every game character, every drawn face becomes something you can talk to.
"Existing avatar solutions add negative value - they are creepy and stiff." Lina Colucci, Co-founder & CEO, LemonSlice
A ~20-billion-parameter video diffusion transformer specialized for talking humans. It generates every pixel from scratch - full-emotion faces, hand gestures, expressive movement - rather than layering lip-sync over a static image.
Turn a single image into a live video conversation. Works for photorealistic humans and cartoon characters alike, streaming at ~20fps on one GPU. The avatar can pull from external knowledge bases so answers stay accurate, not just on-lips.
Give any voice agent or chatbot a face via API or a single line of embed code. Customize backgrounds, styling and appearance; voice is powered by ElevenLabs. The hard part hides behind copy-paste.
Content moderation guards against unauthorized face and voice cloning from day one - the permission slip a generative-video product needs to exist in the real world.
All video eventually becomes interactive - generated on the fly, personalized to whoever is watching.
The founding team is eight people, three of whom hold doctorates and all of whom grew up making video and music. It is a strange resume for a foundation-model company, and it is exactly the point: LemonSlice is built by lifelong creators building the tools they always wished existed.
Brazilian-born ballerina, musician and vlogger. PhD across MIT and Harvard. Previously co-founded a profitable ML company that generated $10M in revenue.
ML engineering and product. Co-founded Edge Analytics (12 engineers, ~$3M/year). MIT PhD track in ML algorithms for medical imaging.
10+ years in AI/ML and synthetic media. Co-founded Edge Analytics. PhD in Bioengineering from Stanford.
LemonSlice puts a talking face on interactions that used to be text boxes: education and language learning, e-commerce guides, corporate training, customer service, homework help, even mental-health support. The buyers are developers building AI agents; the audience is everyone those agents talk to. Exact user numbers aren't public - the company is early - but the surface area is wide by design.
LemonSlice competes with the avatar and talking-video field - and argues it's playing a different sport by generating from scratch in real time.