Somewhere in a data center right now, a $30,000 GPU is doing nothing. It can swallow terabits of data per second. Instead it waits - idle most of the time - while a tired CPU somewhere upstream unpacks files one at a time and feeds it through a straw. Spiral, an 18-person company in New York, exists because that picture is absurd. Its entire reason for being is to take the straw away.
That sentence is the whole company in a breath. For fifty years, databases were built for people. People read dashboards. People run a query, sip coffee, read twelve rows. The systems underneath - Postgres, then the big-data lakehouses - were tuned, sensibly, for that rhythm. The trouble is that the main reader stopped being a person. It became a model in training, demanding millions of images a second, and nobody had bothered to redesign the warehouse for a customer who never blinks.
The Third Age of Data
Spiral likes to tell history in three acts. In the First Age, humans put data in and humans took data out - the Postgres era, human-scale on both ends. In the Second Age, machines started writing at enormous volume but people still did the reading - the lakehouse era of Snowflake and Databricks. We are now in the Third Age, where machines do both the writing and the reading. The inputs are machine-scale; so, finally, are the outputs.
It is a tidy story, and like all tidy stories it is mostly a setup for the punchline: the tools we use were built for Act Two. The incumbents, Spiral argues, are "bolting new marketing onto old architectures" - relational tables, schema-bound warehouses, batch ETL pipelines optimized for human dashboards, now asked to feed GPUs they were never designed to serve. AI data is messy, multimodal, and arrives in awkward sizes. Legacy formats handle it the way a tuxedo handles a swim.
There is a particular spot where the old systems fall apart, and Spiral has a name for it: the uncanny valley between 1 kilobyte and 25 megabytes. Too big to be a tidy database row, too small to be a happy file on disk. An embedding, an image, a short clip of video. Most systems are good at the very small or the very large and miserable in the middle - which is, inconveniently, exactly where modern AI data lives.
Three engineers and a hunch
The bet was placed by three people who had spent years inside one of the most demanding data platforms on earth. Will Manning, Rob Kruszewski, and Nick Gates met building infrastructure for Palantir Foundry, with stints at Citadel - places where data at impossible scale is a Tuesday, not a moonshot. They had felt the straw personally. So in 2024 they did the unglamorous, slightly reckless thing: they decided not to optimize the existing stack but to start at the file format, the lowest layer, and rebuild upward.
Will Manning
Rob Kruszewski
Nick Gates
Starting at the file format is a bit like deciding to fix traffic by reinventing the wheel. Ambitious. Possibly mad. But the logic holds: if every byte your GPU eventually sees has to pass through the format, then the format is where the bottleneck either lives or dies. Fix it there and everything above gets faster for free. So they wrote Vortex.
Vortex, and the database on top of it
Vortex
A next-generation columnar file format and compression toolkit, written largely in Rust. Parquet-class compression, but 10-20x faster scans, roughly 5x faster writes, and up to 100-200x faster random access. Its party trick: GPU-native decompression that streams data from object storage straight into GPU memory, no CPU middleman. Spiral handed it to the Linux Foundation.
Spiral Database
Built on Vortex and object-store native from day one. It promises GPU-saturating throughput, unified governance across every data type, and "fearless" permissioning - granular, time-bounded, audited. One API handles everything from a tiny embedding to a massive video file, including the awkward middle where other systems quietly give up.
Giving away your core technology sounds like a strange way to build a business, until you remember that Parquet became the default precisely because it was free, neutral, and everywhere. Spiral is playing the same card. Vortex lives at the LF AI & Data Foundation as an incubation project, racking up around 3,000 GitHub stars and 90-odd releases, with integrations for Arrow, DataFusion, DuckDB, Spark, Pandas and Polars. The format wins hearts; the managed platform pays rent.