The distributed-data company with a stubbornly simple idea: don't move the data. Move the compute to it.
Above: the Expanso mark. A company whose entire pitch is about the trip you don't take - the petabytes that never leave home, and the cloud bill that never arrives.
Here is a thing about data that is true and slightly absurd: the most expensive part of a data pipeline is often not the analysis, the storage, or the fancy AI model at the end. It is the moving. Enterprises generate data all over the place - factory sensors, retail stores, hospital systems, log streams from thousands of servers - and then, at considerable cost, they ship all of it to a central cloud so a computer can look at it, decide most of it was junk, and throw 90% of it away.
You are, in other words, paying a toll to haul cargo across the country so that a warehouse can tell you it was trash. Expanso, a Seattle company founded in 2022, looked at this arrangement and asked the reasonable question: what if we did the sorting before we paid for the truck?
That is the whole idea, and Expanso has given it a name - “Compute Over Data.” Instead of moving data to where the compute is, you move the compute to where the data already sits. A lightweight agent runs at the source, filters and transforms and governs the data in place, and then forwards only the part that matters. The data that stays put is cheaper (no egress), safer (no travel means fewer places to leak), and faster to act on.
None of this would matter if it were just a slide. What makes Expanso worth a dossier is that the thing works, it is open source at its core, and the people building it have done this kind of plumbing before - at the scale where getting it wrong is very visible.
Data → Cloud → Compute. Copy everything from every source into one central place, pay for the transfer and the storage, then run the job. Simple to reason about, painful on the invoice, and awkward when the data was never supposed to leave the building.
Compute → Data → Result. Send the job to where the data lives. Filter, transform and govern at the source. Only the useful, compliant slice travels onward. Less movement, lower cost, and data sovereignty by default.
Compute Over Data: bring the processing to where the data lives, rather than moving the data to the cloud first.
There are two audiences who almost never agree - the finance team that hates the cloud bill, and the compliance team that hates data leaving the building. Expanso's approach happens to satisfy both at once. When two opposing parties both win, you have usually found real leverage rather than a marketing line.
The company reports figures in the neighborhood of 10x faster pipeline deployment and, through its Red Hat OpenShift integration, cost reductions of 50-70%. Treat those as vendor numbers - directional, not gospel. The underlying logic, though, is hard to argue with: the cheapest byte to process is the one you never had to move.
The open-source distributed compute engine that runs jobs where data lives. Its public demo network has processed more than 1.5 million jobs for partners including the University of Maryland, BOINC and the New Atlantis Foundation. (The name is Portuguese for salted cod - a thing valuable enough to preserve and distribute.)
Lightweight edge agents, policy-based governance across thousands of sources, 100+ connectors to Snowflake, Databricks, Splunk, Datadog and Elastic, self-healing pipelines, and full data-lineage tracking for compliance - PII and GDPR handling included.
Enterprises and institutions with scattered data: universities, research networks, telcos, and - per the company - some of the world's largest defense organizations. Anywhere data is too big, too sensitive, or too regulated to move comfortably.
If distributed systems have a resume, Aronchick's is a strong one. He was the first non-founding product manager on Kubernetes, co-founded Kubeflow at Google, and later ran open-source machine learning at Microsoft. Expanso is what happens when someone who spent years watching enterprises struggle with scale decides the problem worth solving is not another orchestrator - it is the data itself. The company is co-founded by alumni of Google, AWS and Microsoft.
In November 2023 - not an easy moment to raise anything - Expanso announced a $7.5 million seed round led by General Catalyst and Hetz Ventures, with Array Ventures joining. One account described the fundraising conditions of the period as “crazy.” Raising into a bad market is a mild signal that investors believed the problem was real rather than fashionable. In 2024, Samsung Next added a strategic investment.
Control Your Data. Everywhere.
Video: search “Bacalhau Compute Over Data” on YouTube for conference talks and product demos featuring David Aronchick. No single official channel is confirmed here, so the link points to a scoped search rather than an unverified URL.