Positron AI

It is a Tuesday in Reno, the temperature outside is doing whatever Reno temperatures do, and inside a windowless room a small black box is quietly servicing somebody's chatbot. The box is not a GPU. The bill it produces will not require a venture round to cover. That, in a sentence, is Positron AI.

The company is 62 people, mostly engineers, working out of an unfashionable corner of Nevada. It does not have a flashy office and it does not pretend to. What it has is a working inference appliance called Atlas, a Series B that closed in February 2026 at a $1B-plus valuation, and a clearly enunciated grudge against the economics of running large language models on graphics cards.

Stop GPUs from bankrupting everyone trying to deploy AI. - Positron's stated mission, lightly paraphrased

Section IThe Problem They Saw

For about three years, the AI conversation has been a single word - training - dressed up in different fonts. Training is the loud part. Training is what sells the stock. Training is, however, a one-time party. Inference is the mortgage. Every chat message, every translation, every code completion is a fresh withdrawal from the same expensive checking account, and the account is denominated in Nvidia GPUs.

Positron's three co-founders - Thomas Sohmers, Barrett Woodside, and Edward Kmett - looked at that arrangement in early 2023 and concluded, with the unsentimental clarity that engineers occasionally exhibit, that someone was going to build inference-specific silicon and that someone might as well be them. Sohmers had been a Thiel Fellow at seventeen and had been chasing inference silicon ever since. Kmett brought applied mathematics. Woodside brought the rare ability to ship.

The Pitch, Compressed Training is the showroom. Inference is the road. Cars that win at one rarely win at the other. Positron is the road car.

Section IIThe Founders' Bet

The bet had three parts, and the third was the controversial one. First: design hardware that runs only transformer-style inference, not everything Nvidia has historically had to support. Second: stuff it with memory, because inference is memory-bound the way a Manhattan walk-up is square-foot-bound. Third - and this is the part that makes compiler people clutch their pearls - skip the compiler stack entirely. Map models directly to silicon. The phrase the company uses is "zero compiler." The phrase competitors use is less printable.

Two years later, the bet had a name. Atlas. It is an eight-card appliance, air-cooled, that drops into a normal data-center rack and runs models up to 500 billion parameters without anyone reaching for a coolant manifold. Positron says it built it in roughly 18 months. People who build silicon for a living tend to raise an eyebrow at that number, and then look at the box, and then lower the eyebrow.

Training is a one-time party. Inference is the mortgage. - The argument, in one sentence

Section IIIThe Product, In Plain English

Atlas is the kind of object that does not photograph well. It is a rectangle. It has fans. It does not glow, unless a marketing team is involved. What it does, according to Positron's own published numbers and the customers willing to confirm them on background, is run an inference workload at roughly three times lower end-to-end latency than a comparable H100-based system, and at roughly four times better performance per watt. It is also, somewhat conspicuously, fabricated in the United States, which in 2026 is no longer an aesthetic preference but a procurement requirement at a non-trivial number of accounts.

The bigger swing comes next. Asimov is Positron's first custom silicon - tape-out planned for late 2026, production in early 2027 - and it is designed around an idea that sounds simple and is not. Put two terabytes of memory on the accelerator. Not two terabytes of memory in the rack. Two terabytes per chip. Nvidia's upcoming Rubin part is expected to land around 384 GB. Positron is aiming for roughly six times that, and claiming five times the tokens-per-watt on core workloads.

Lower latency vs H100

Perf-per-watt

2TB

Memory per Asimov chip

18mo

Atlas concept to ship

A Short, Fast Company History

Spring 2023

Founded by Sohmers, Woodside, Kmett

Feb 2025

Mitesh Agrawal joins as CEO; $23.5M seed

Jul 2025

$51.6M Series A. Atlas ships.

Feb 2026

$230M Series B at $1B+ valuation

2027

Asimov silicon & Titan system arrive

Section IVThe Proof

The performance claims would be unconvincing if the cap table were not. The Series B was co-led by ARENA Private Wealth, Jump Trading, and Unless, with strategic checks from Qatar Investment Authority, Arm, and Helena. Existing backers - Valor Equity Partners, Atreides Management, DFJ Growth, 1517 - stayed in. Arm, in particular, does not casually write strategic checks into companies that compete with its single most important customer.

On the customer side, Positron is publicly associated with Cloudflare and a quiet list of cloud and network infrastructure operators. The remaining names are not yet on a slide. Most inference deals do not get press releases. The buyer is usually a VP of Infrastructure with a spreadsheet and a token bill they cannot defend in their next budget review.

Performance, According to Positron

Atlas vs H100, evaluated transformer inference workloads

Latency (lower = better)

H100 baseline

Atlas latency

~3x faster

Perf / watt

H100

Atlas perf / watt

~4x better

Memory / chip (next-gen)

Rubin: 384 GB

Asimov memory

~2.3 TB

Self-reported by Positron. Independent benchmarks are still scarce. Treat as a claim, not a verdict.

Section VThe Mission, Without Mystique

Mitesh Agrawal, the CEO who took the wheel in February 2025, scaled Lambda from $500K to a $500M annualized run rate before he arrived. He talks about Positron the way a logistics person talks about logistics. The mission is not poetic. It is: efficient and affordable inference, so the cost of running AI does not eat the value of using it. Everything else - the Asimov roadmap, the made-in-America fabrication, the air-cooled form factor - is in service of that one budget line.

There is something faintly old-fashioned about a company in 2026 making a hardware product and asking people to plug it in. The current style of AI startup prefers to wrap intelligence in an API and charge by the token. Positron sells the box that the API runs on. It is the unfashionable end of the value chain, which is exactly where the durable margins tend to live.

The company sells the box. Everyone else sells what comes out of it. - A useful frame for understanding Positron

Section VIWhy It Matters Tomorrow

Two things will happen in the next 24 months regardless of who wins. Inference demand will keep growing on a curve that does not respect anyone's capex plan, and power-per-rack will become the dominant constraint in data-center design. A company whose pitch is "more memory, fewer watts, no compiler, and we made it here" is well-positioned for both of those facts.

Whether Positron actually displaces Nvidia at scale, or merely peels off the inference layer of the market, or gets acquired by someone who needs that capability in a hurry, is unknown. What is knowable now is that the company has shipped, raised, and named a roadmap that points somewhere the incumbent does not currently sit. That alone is rare.

Back to Reno. The black box is still in the windowless room. It is still servicing somebody's chatbot. The fans are still going. The bill, at the end of the month, will be smaller than it would have been. The chatbot will not know the difference. The CFO will.

The Short Version Positron is what you build when you take inference seriously as a separate business from training, refuse to be polite about it, and have $305M to spend on being right.