A search bar for everything you can't read.
Somewhere in a media company's data lake right now, a producer is typing the words "red sneaker, beach, slow motion" into Slack and asking a colleague if anyone remembers the clip. Twelve people get pinged. Three hours go by. The deadline does not move. This is the part of the modern internet that doesn't get talked about much - the trillions of frames of video and images that businesses technically own but cannot meaningfully find.
Coactive AI exists for that producer. The San Jose company, founded in 2021 and now sitting on $44 million in venture capital, has built what it calls a Multimodal Application Platform. In plain English, it is a system that watches your videos, looks at your images, listens to your audio, and turns the whole pile into something you can query - the way you would query a spreadsheet.
The internet went visual. The tools didn't.
The shift happened gradually and then all at once. By the early 2020s, the bulk of new internet content was video and image. TikTok, Reels, streaming platforms, retail product pages, security footage, telehealth recordings - the bytes were arriving faster than anyone could label them. The standard playbook was to hire humans to write tags, or to plug in a generic computer-vision model and hope it knew the difference between a baseball cap and a beanie.
Neither worked at scale. Manual tagging is slow, expensive, and inconsistent. Off-the-shelf models hallucinate at the edges and refuse to learn your specific business vocabulary. Meanwhile, the executive dashboard says you have a "content library." What you actually have is a basement full of unlabeled boxes.
Coactive's bet was that the missing piece wasn't a better model - it was an entire stack purpose-built for multimodal data. Embeddings, indexing, search, tagging, analytics, governance. All of it. Glued together so you stop building it yourself.
Two MIT alums who refused to stop asking why is this so hard?
Cody Coleman and Will Gaviria Rojas met as undergraduates at MIT through the Interphase Edge program. Coleman went on to Stanford, where his PhD work focused on data-efficient deep learning - figuring out how to train good models without drowning in labels. Gaviria Rojas spent time at eBay as a data scientist. Both had seen, from different angles, the same gap: enterprise teams were trying to do machine learning on visual content with infrastructure designed for text.
They started Coactive in 2021. The initial check came from Andreessen Horowitz and Bessemer Venture Partners - $10.4 million to figure out whether the thesis would hold. It did. Greycroft followed. Then in May 2024, Emerson Collective and Cherryrock Capital co-led a $30 million Series B that pushed the company's valuation to roughly $200 million.
What it actually does when you turn it on.
Coactive's platform ingests video, image, and audio - the messy, time-aligned kind that most data warehouses cannot really hold. It generates embeddings, builds an index, and exposes both a search interface and an analytics layer. From there, four things become possible that were not really possible before.
First, you can search across visual, audio, and transcript signals at the same time. Type a natural-language query and get back specific moments inside specific clips, not vague references to whole files. Second, you can auto-tag at scale - and the tags can be your own taxonomy, not a generic ImageNet vocabulary. Third, you can run analytics that connect content attributes to user behavior. Which thumbnail variant retained the audience? Which scenes correlate with drop-off? Fourth - and quietly the most useful - you can do content moderation that actually catches what humans miss.
The precision lift, in one chart
The story so far
The companies that picked up the phone.
The customer roster, like most enterprise AI rosters, is held quietly. What's public is the shape: large media and entertainment companies, retailers, technology platforms, and at least one real estate operation. The pattern is consistent - whoever has the most visual data, and the most acute pain about not understanding it, becomes a Coactive customer first.
One digital agency, replacing a legacy creative intelligence model, jumped from 59 percent precision to over 95 percent. That is not the kind of number you bury in a footnote. That is the kind of number that gets the platform written into the next year's procurement plan.
Partnerships matter here, too. The AWS Strategic Collaboration Agreement is not a press release - it is a distribution channel. Databricks and Microsoft round out the technology partners. The platform plays well with the stack that enterprises already have, which is the only way enterprise software actually gets adopted.
Make the largest dataset on earth useful.
The framing Coactive uses is deliberately practical. Not "AGI for video." Not "the Google of vibes." The pitch is closer to: this is the world's biggest dataset, and it is sitting on hard drives nobody can query. We will build the operating system for it.
That phrasing matters because it sets the scope. Coactive is not chasing consumer chat assistants or trying to win at general-purpose model benchmarks. It is solving an enterprise integration problem - taking visual data and turning it into rows, columns, signals, and search results. Boring, in the way that AWS S3 is boring. Foundational, in the way that AWS S3 is foundational.
The producer types the query. The clip is on the screen.
Go back to the opening scene. Producer. Slack. Red sneaker, beach, slow motion. Three hours. Twelve coworkers. Missed deadline. In a world where Coactive is running quietly inside that media company's content stack, the same query takes about four seconds. The clip surfaces. The producer keeps working. Nobody gets pinged.
That is the small, specific change Coactive is building toward - and it generalizes. Retailers find the product image that converts. Streamers find the moment that retains. Moderation teams find the frame they were never going to catch by hand. The work that used to live in human heads moves into systems. The systems learn the specific vocabulary of the business they serve.
None of this is dramatic. It is just useful. Useful enough that the procurement plans are getting rewritten, the AWS marketplace is now a distribution layer, and a small team in San Jose is, with a kind of unglamorous discipline, building the infrastructure that makes "search your video library" finally mean something. The blind spot is closing. Slowly. And then, the way these things go, all at once.