He took a physics brain wired for free-electron lasers and pointed it at your living room. The result: an AI that redesigns a space from a single photo.
Upload one photograph of a room - empty, cluttered, dated, whatever - and a few seconds later it comes back restaged, refurnished, reimagined. That is the trick Xiao Zhang has spent half a decade refining at Collov Labs, and it is also a thesis disguised as a product.
Most AI companies in 2026 want you to type. Zhang thinks that is a dead end. "The next interface for AI won't be text or chat-based," he says. "It will be the camera." Collov is the argument made flesh: the input is not a sentence describing a sofa, it is a picture of the place the sofa will live.
The company, headquartered at 370 Convention Way in Redwood City, runs two consumer products - Collov AI and CozyAI - on top of a stack that blends diffusion models, spatial reasoning, and multi-step agentic workflows. The pitch to real-estate agents is brutally simple: a listing that once took weeks and a moving truck to stage can now be staged virtually in seconds. The pitch to homeowners is even simpler. If you can take a photo, you can stage it.
In April 2026 the bet got a vote of confidence: a $23 million Series A led by a roster including Brightway Future Capital, Taihill Venture, MindWorks Capital, and Matrix Partners. On the same day, Collov announced a dedicated research lab to push visual intelligence further - toward AI that does not just describe a scene but reasons about it and acts on it.
The numbers behind the round are not vanity metrics. More than a million users. Over 300 companies. And a distinction that matters more than either: Collov's visual model became one of the first consumer AI applications baked directly into Qualcomm's on-device systems - the AI moving off the server and into the thing in your hand.
What that integration signals is a direction of travel. Cloud AI is convenient until it is not - until latency, privacy, or the simple cost of a round trip to a data center starts to bite. Putting a visual model on the device itself is a different kind of claim: that seeing and reasoning about a scene should happen where the camera already lives. For a company whose entire thesis rests on the lens, on-device is not a feature. It is the natural habitat.
It also reframes who Collov is for. The early story was real estate - virtual staging for agents who needed a vacant listing to feel like a home. That market is still there, and it is large; staging has always been one of the quiet levers that moves a property faster and higher. But the camera-first idea does not stop at listings. It points at anyone holding a phone in a room they want to change, which is a considerably larger room than real estate alone.
The next interface for AI won't be text or chat-based. It will be the camera.Xiao Zhang, CEO & Co-Founder, Collov Labs
Before there was a furniture-staging AI, there was a free-electron laser. Zhang's doctoral work at Stanford was about machine learning for FEL optimization - teaching algorithms to tune one of the more temperamental instruments in experimental physics. It is the kind of credential that does not obviously lead to redesigning kitchens, which is exactly the point.
He arrived there from Peking University, where he studied physics and did undergraduate research between 2012 and 2016. At Stanford he was not just in the lab. He served as VP of the Association of Chinese Students and Scholars and co-presided over the Chinese Entrepreneurs Organization - the organizing instinct of a founder showing up years before the company did.
When he and his co-founders launched Collov around 2021, the obvious question came fast: why not just build another wrapper around a large language model, like everyone else? They went the harder direction and taught AI to see space instead. The physicist's reflex - model the system, respect its geometry - turned out to travel surprisingly well from particle beams to throw pillows.
The co-founding bench is deep. CTO Casey Zhou worked on generative AI at TikTok and on multimodal models at Berkeley's BAIR Lab. Chief Scientist Rex Ying is a Yale professor and lead author of PinSage, the graph neural network Pinterest put into production. Zhang's job is to point that firepower at a single conviction.
There is a tidy irony in a physicist running a design company. Interior design is the discipline of taste, of the unmeasurable - and Zhang's instinct is to measure it anyway. A free-electron laser does not care about your feelings, and neither, in the end, does a diffusion model. Both reward the person who can describe a system precisely enough to optimize it. The bet underneath Collov is that "what looks right in this room" is a system like any other, and that the right model can learn its rules.
The leadership philosophy reads less like a design studio and more like a lab notebook. Find product-market fit through relentless customer input, not intuition. Hire subject-matter experts and keep the org lean. Set transparent KPIs so the whole company is aligned on the same scoreboard. Make decisions with discipline and strategic focus. He has been candid that scaling a deep-tech company forced his own management style to evolve - and equally candid that the unglamorous part, the fundraising, is exactly as hard as it looks. "Fundraising is not that easy," he has said, with the flatness of someone who has done it more than once.
Collov's funding has climbed in steps. The latest round nearly tripled the early Series A figure, and total capital raised now sits around $59 million.
Virtual staging is one of those ideas that sounds small until you do the arithmetic. A vacant listing photographs cold. Buyers struggle to imagine furniture they cannot see, and a home that sits is a home that drops its price.
Traditional staging answers that with rented furniture, a crew, and a calendar measured in weeks. Collov answers it with a photograph and a few seconds of compute. The agent uploads the empty room; the model returns it furnished, lit, and styled - and can return it five more ways before the coffee cools. The friction that used to live between a listing and its best self mostly evaporates.
That is the wedge, and it explains the early customer base of real-estate professionals. But Zhang has always treated staging as the demonstration rather than the destination. The same machinery that adds a sofa can remove one, swap a floor, repaint a wall, or reason about how light falls across a rug. Each of those is a small proof that the model understands not just objects but the space they sit in - the part that is genuinely hard.
Recognition followed the conviction. In 2022 Zhang landed on the Forbes Global Chinese Top 100, a list of technology entrepreneurs - validation that arrived years before the $23M round and the on-device milestone. The company he had been told to turn into an LLM wrapper was, instead, becoming a case study in doing the harder thing on purpose.
The technologies under the hood are unglamorous in the way good infrastructure usually is, and the company keeps its workflow lean. What is not unglamorous is the ambition: to be the firm that figures out what AI does once it can see. That is the whole game, and Zhang has organized his life around it with the single-mindedness of someone who genuinely believes the camera is about to eat the keyboard.
He runs a deep-tech company the way a physicist runs an experiment: hire the actual experts, keep the team lean, set transparent KPIs, and let customer feedback - not vibes - decide what product-market fit looks like. The discipline is the moat.
While the rest of the industry was racing to wrap language models, Zhang spent his time on the harder, less fashionable problem of getting machines to understand physical space. The camera-first thesis only sounds obvious now that he has spent years making it work.
April 2026 - Collov Labs closes a $23M Series A and stands up a dedicated research lab focused on visual intelligence systems - scene understanding, visual reasoning, and turning camera input into real-world action.
The frontier - The lab's mandate is the part Zhang cares about most: not AI that talks about the world, but AI that looks at it and does something. The phone camera as the keyboard of the next era.
The throughline - Trace it backward and the career has an odd coherence. Peking University physics. A Stanford PhD spent teaching machines to optimize a laser. A startup that teaches machines to optimize a room. A research lab built to teach machines to optimize their grasp of the physical world itself. It is the same problem at four different scales, and Zhang has been working on it the whole time - just with progressively more interesting test cases.
Make the camera the primary way people use AI. Build systems that see a scene, reason about it, and act - on consumer devices, at consumer scale. The living room was just the first room.
Sources: Stanford Tech Review, Collective Genius, Pulse 2.0, Axios Pro, MindWorks Capital, HousingWire, CRETI, The Org.