The enterprise AI testing platform turning the slippery, non-deterministic behavior of large language models into something an engineering team can actually measure, alert on, and fix.
Not crashing. Not throwing an exception. Just - drifting. Yesterday it summarized contracts in three crisp paragraphs. Today it adds a fourth, and the fourth one is slightly wrong. Nobody on the platform team will notice for nine days. Distributional notices in nine minutes.
The company sells what nobody quite had a name for two years ago: a test suite for software that does not return the same answer twice. Their thesis is unglamorous and overdue. Generative AI broke the unit test. The deterministic assertion - assertEqual(output, "expected") - was an artifact of a deterministic world. Probabilistic software needs probabilistic tests. Distributional builds them.
Scott Clark co-founded SigOpt in 2014 - a Bayesian optimization platform for tuning models when nobody used the word "model" in polite company. Intel bought it in 2020. Clark stayed, ran a 200-person AI software organization, watched the generative wave arrive, and noticed a pattern: enterprises wanted to ship LLMs, but they had no idea how to prove the things were behaving.
So in September 2023, he started Distributional with six co-founders. By February 2024, Andreessen Horowitz and Operator Collective had written an $11M seed check. By October, Two Sigma Ventures led a $19M Series A. Total raised in under a year: $30M. Total time spent explaining what AI testing means: still ongoing.
Co-founder & CEO. Cornell math, PhD in applied mathematics from Cornell. Software lead in Yelp's ad-targeting era. Co-founder of SigOpt. Former VP/GM at Intel. Now: making AI behave.
Two Sigma Ventures (lead, Series A). Andreessen Horowitz. Operator Collective. Oregon Venture Fund. Essence VC. Alumni Ventures. Plus the angels you'd expect when an Intel alum starts shipping.
The product is a platform - self-hosted in your VPC, multi-tenant SaaS, or single-tenant. A Python SDK plugs it into your CI, your orchestrator, your data store, your alerting. What it provides, in plain English, is three things working together.
Collect production traces. Augment the data. Write tests on distributions, not single outputs. Alert when the distribution shifts. Triage. Resolve.
Collaborate on tests, analyze results, calibrate thresholds, capture audit trails, and produce the governance reports your compliance team has been quietly asking about.
A preference-learning loop tunes data augmentation, test selection, and calibration. The test suite gets smarter as the model changes - which it will.
Drift used to mean a feature distribution shifting in training data - a problem MLOps tools had reasonable tooling for. Generative AI inflicted a more uncomfortable variant: the model still works, the prompts still land, and the outputs are still grammatical - but they have started, somehow, to be worse. A little more hedged. A little less accurate. A little more inclined to invent regulatory citations.
You cannot detect this with logs. You cannot detect it with unit tests. You can detect it by treating model behavior as a distribution and watching how that distribution moves over time. That is, essentially, the company's whole pitch, and so far the Fortune 500 has been picking up the phone.
Companies with operational or reputational risk attached to generative AI applications - which, by 2026, is most of them.
VPC self-host, single-tenant VPC, or multi-tenant SaaS. Pick the option that lets your security review go home on time.
Scott Clark and six co-founders incorporate Distributional after Clark leaves Intel.
Andreessen Horowitz and Operator Collective back the bet on a new category.
Two Sigma Ventures leads. Platform launches publicly. The category has a name.
Headcount grows toward 25; research and platform engineering deepen in parallel.
Somewhere, an agent is misbehaving. Distributional is watching the distribution.
Wire Distributional into your CI. Define behavioral tests. Get paged when production drifts from the baseline you signed off on.
Use DBNL's adaptive analytics on production logs to surface where the agent's behavior changed - and which tool, prompt, or model version is responsible.
Run side-by-side behavioral tests when migrating from one model to another. Catch the regressions a benchmark will miss.
Audit trails, test outcomes, calibration history - the artifacts a compliance review actually wants to see.
But this time the platform team knows. A test fired at 09:14 - the distribution of contract-summary lengths shifted by 1.8 standard deviations from baseline. By 09:23 someone is in the dashboard, watching which production traces tripped the alert. By 10:01 the team has rolled back the prompt change that did it. The fourth paragraph, the slightly-wrong one, never makes it to the next customer.
That is the scene Distributional is selling. Not the absence of failure - generative AI will fail - but the closing of the gap between the moment the model misbehaves and the moment a human can do something about it. The deterministic test suite was an invention of the 1990s. The probabilistic one is being written now, in San Francisco, by a small team with a Greek letter or two on the whiteboard.
The agent never stops drifting. The point is what happens next.