BREAKING - Coactive AI closes $30M Series B at ~$200M valuation $44M total raised since 2021 AWS Strategic Collaboration Agreement signed One digital agency: 59% to 95%+ precision in one platform swap MIT News features Coactive on visual content understanding Backed by a16z, Bessemer, Greycroft, Emerson Collective, Cherryrock BREAKING - Coactive AI closes $30M Series B at ~$200M valuation $44M total raised since 2021 AWS Strategic Collaboration Agreement signed One digital agency: 59% to 95%+ precision in one platform swap MIT News features Coactive on visual content understanding Backed by a16z, Bessemer, Greycroft, Emerson Collective, Cherryrock
Coactive AI logo and brand
Filed: San Jose, CA - the building looks unremarkable. The software inside does not.
Company File - YesPress

Coactive AI

The multimodal platform teaching enterprise software to actually look at the videos and pictures it owns.

FOUNDED 2021 SAN JOSE, CA SERIES B - $44M RAISED CATEGORY / AI

A search bar for everything you can't read.

Somewhere in a media company's data lake right now, a producer is typing the words "red sneaker, beach, slow motion" into Slack and asking a colleague if anyone remembers the clip. Twelve people get pinged. Three hours go by. The deadline does not move. This is the part of the modern internet that doesn't get talked about much - the trillions of frames of video and images that businesses technically own but cannot meaningfully find.

Coactive AI exists for that producer. The San Jose company, founded in 2021 and now sitting on $44 million in venture capital, has built what it calls a Multimodal Application Platform. In plain English, it is a system that watches your videos, looks at your images, listens to your audio, and turns the whole pile into something you can query - the way you would query a spreadsheet.

Most businesses have a massive blind spot. They don't know what's happening in their visual data. - Coactive AI's pitch, paraphrased only slightly
Above: the line the founders use in pitch meetings. It works because it is true.

The internet went visual. The tools didn't.

The shift happened gradually and then all at once. By the early 2020s, the bulk of new internet content was video and image. TikTok, Reels, streaming platforms, retail product pages, security footage, telehealth recordings - the bytes were arriving faster than anyone could label them. The standard playbook was to hire humans to write tags, or to plug in a generic computer-vision model and hope it knew the difference between a baseball cap and a beanie.

Neither worked at scale. Manual tagging is slow, expensive, and inconsistent. Off-the-shelf models hallucinate at the edges and refuse to learn your specific business vocabulary. Meanwhile, the executive dashboard says you have a "content library." What you actually have is a basement full of unlabeled boxes.

Coactive's bet was that the missing piece wasn't a better model - it was an entire stack purpose-built for multimodal data. Embeddings, indexing, search, tagging, analytics, governance. All of it. Glued together so you stop building it yourself.

Your content catalog has a search bar. It just doesn't work. - the founding observation, in six words

Two MIT alums who refused to stop asking why is this so hard?

Cody Coleman and Will Gaviria Rojas met as undergraduates at MIT through the Interphase Edge program. Coleman went on to Stanford, where his PhD work focused on data-efficient deep learning - figuring out how to train good models without drowning in labels. Gaviria Rojas spent time at eBay as a data scientist. Both had seen, from different angles, the same gap: enterprise teams were trying to do machine learning on visual content with infrastructure designed for text.

They started Coactive in 2021. The initial check came from Andreessen Horowitz and Bessemer Venture Partners - $10.4 million to figure out whether the thesis would hold. It did. Greycroft followed. Then in May 2024, Emerson Collective and Cherryrock Capital co-led a $30 million Series B that pushed the company's valuation to roughly $200 million.

Coleman and Gaviria Rojas have impressive track records at eBay, Google, Facebook, and Pinterest. - Greycroft, on why they wrote a check
Roughly translated: these are the people who built the visual-content stacks at companies whose visual content actually matters.

What it actually does when you turn it on.

Coactive's platform ingests video, image, and audio - the messy, time-aligned kind that most data warehouses cannot really hold. It generates embeddings, builds an index, and exposes both a search interface and an analytics layer. From there, four things become possible that were not really possible before.

First, you can search across visual, audio, and transcript signals at the same time. Type a natural-language query and get back specific moments inside specific clips, not vague references to whole files. Second, you can auto-tag at scale - and the tags can be your own taxonomy, not a generic ImageNet vocabulary. Third, you can run analytics that connect content attributes to user behavior. Which thumbnail variant retained the audience? Which scenes correlate with drop-off? Fourth - and quietly the most useful - you can do content moderation that actually catches what humans miss.

The precision lift, in one chart

Source: Coactive case study, leading digital agency, creative intelligence replacement
Legacy model
59%
Coactive
95%+
Time saved
~5 hrs/run
A boring chart that quietly upended someone's quarterly planning meeting.
No metadata or tags required. - five-word product promise, harder than it sounds

The story so far

2021Cody Coleman and Will Gaviria Rojas found Coactive in San Jose.
2022$10.4M raised from Andreessen Horowitz and Bessemer Venture Partners.
2023Multimodal Application Platform launches commercially. First enterprise media and retail customers go live.
2024 - May$30M Series B co-led by Emerson Collective and Cherryrock Capital. Valuation ~$200M.
2024Strategic Collaboration Agreement signed with AWS for image and video analytics.
2025MIT News profiles the company on how machines learn to understand visual content.

The companies that picked up the phone.

The customer roster, like most enterprise AI rosters, is held quietly. What's public is the shape: large media and entertainment companies, retailers, technology platforms, and at least one real estate operation. The pattern is consistent - whoever has the most visual data, and the most acute pain about not understanding it, becomes a Coactive customer first.

One digital agency, replacing a legacy creative intelligence model, jumped from 59 percent precision to over 95 percent. That is not the kind of number you bury in a footnote. That is the kind of number that gets the platform written into the next year's procurement plan.

$44M
Total funding
$200M
Series B valuation
63
Team size
2021
Year founded

Partnerships matter here, too. The AWS Strategic Collaboration Agreement is not a press release - it is a distribution channel. Databricks and Microsoft round out the technology partners. The platform plays well with the stack that enterprises already have, which is the only way enterprise software actually gets adopted.

A 59-to-95 jump is the kind of number that ends meetings, not starts them. - the case study, told plainly

Make the largest dataset on earth useful.

The framing Coactive uses is deliberately practical. Not "AGI for video." Not "the Google of vibes." The pitch is closer to: this is the world's biggest dataset, and it is sitting on hard drives nobody can query. We will build the operating system for it.

That phrasing matters because it sets the scope. Coactive is not chasing consumer chat assistants or trying to win at general-purpose model benchmarks. It is solving an enterprise integration problem - taking visual data and turning it into rows, columns, signals, and search results. Boring, in the way that AWS S3 is boring. Foundational, in the way that AWS S3 is foundational.

An enterprise-grade operating system for visual content. - Bessemer Venture Partners, summarizing the thesis

The producer types the query. The clip is on the screen.

Go back to the opening scene. Producer. Slack. Red sneaker, beach, slow motion. Three hours. Twelve coworkers. Missed deadline. In a world where Coactive is running quietly inside that media company's content stack, the same query takes about four seconds. The clip surfaces. The producer keeps working. Nobody gets pinged.

That is the small, specific change Coactive is building toward - and it generalizes. Retailers find the product image that converts. Streamers find the moment that retains. Moderation teams find the frame they were never going to catch by hand. The work that used to live in human heads moves into systems. The systems learn the specific vocabulary of the business they serve.

None of this is dramatic. It is just useful. Useful enough that the procurement plans are getting rewritten, the AWS marketplace is now a distribution layer, and a small team in San Jose is, with a kind of unglamorous discipline, building the infrastructure that makes "search your video library" finally mean something. The blind spot is closing. Slowly. And then, the way these things go, all at once.

Share this profile