Breaking

Guide Labs open-sources Steerling-8B - the first large-scale inherently interpretable LLM Every token traced to training data, context & human concepts $9M seed led by Initialized Capital ~90% capability of larger models, fully interpretable Y Combinator W24 - built in San Francisco Weights & code released open under Apache 2.0 Guide Labs open-sources Steerling-8B - the first large-scale inherently interpretable LLM Every token traced to training data, context & human concepts $9M seed led by Initialized Capital ~90% capability of larger models, fully interpretable Y Combinator W24 - built in San Francisco Weights & code released open under Apache 2.0

Company Dossier◆ Interpretable AI◆ San Francisco, CA◆ Est. 2023

Guide Labs

The AI company that decided the black box was a design choice - and then chose otherwise.

Parameters, open

~90%

Of larger models

$9M

Seed round

FIG. 1 - The Guide Labs mark, photographed against studio white. A logo is a promise. This one says: you are allowed to look inside. Most of its industry would rather you didn't.

The Story

A model that hands you the receipt

There is a quiet, load-bearing lie at the center of modern artificial intelligence, and it goes roughly like this: nobody knows why the model did that. It is said with a shrug, the way you'd explain weather. The largest language models are black boxes, the reasoning goes, and the black box is simply the price you pay for the intelligence. Guide Labs, a San Francisco company founded in 2023, exists to argue that this is not a law of physics. It's a habit. And habits can be broken by anyone willing to do the engineering.

The company builds interpretable foundation models - AI systems designed from the start so that a human can trace an output back to its causes. Their pitch is not that they can explain a model after the fact, which is the usual industry move and, per their own founder's academic work, frequently unreliable. Their pitch is that the model is interpretable by construction. The difference matters the way it matters whether a bank shows you the math on your loan denial or just says "computer says no." One of those is auditable. The other is a shrug wearing a lab coat.

"Training interpretable models is no longer a sort of science. It's now an engineering problem."

That line, from CEO Julius Adebayo, is the whole thesis compressed into a sentence. It sounds modest. It is not. Declaring that something has crossed from "open research question" into "thing we can reliably build" is one of the more consequential claims a founder can make, because it changes what you're allowed to expect. If interpretability is science, you wait. If it's engineering, you ship - and then someone has to explain why they didn't.

The Product

Steerling-8B, and the anatomy of a glass box

In February 2026, Guide Labs open-sourced Steerling-8B: an 8-billion-parameter language model released, weights and code, under Apache 2.0 on Hugging Face and GitHub. The headline capability is that every token the model produces can be traced back to three things - the input context, the specific training data that shaped it, and a set of human-understandable concepts. It is, in the company's framing, the first large-scale inherently interpretable language model. You get the answer and the receipt in the same breath.

Under the hood it does two unusual things at once. It generates text via masked diffusion rather than the usual left-to-right prediction, filling in tokens by confidence rather than in reading order. And it inserts a "concept layer" that decomposes the model's internal representations into explicit, inspectable pathways. That decomposition is the part worth staring at.

Fig. 2 - How Steerling-8B splits its own mind

Three pathways, one residual

Known concepts (supervised, ~33K)traceable, labeled

Discovered concepts (learned, ~100K)self-organized

Residual (the honest part)not-yet-explained

The residual is the tell. A credible interpretability story isn't "we understand everything." It's "here is exactly the part we don't."

The practical payoff is a set of things you can't do with an ordinary model. You can suppress or amplify a specific concept at inference time, no retraining required - which means alignment stops being a matter of generating thousands of safety examples and retraining, and starts being a dial you can turn. You can pull training-data provenance for any generated chunk. And Guide Labs says all of this costs surprisingly little: Steerling-8B reaches roughly 90% of the capability of larger, less interpretable models while using less training data. If that holds up, the long-assumed tax on transparency is a lot smaller than the field priced it at.

What You Can Do With It

For the rooms where "trust me" doesn't fly

Guide Labs aims at the domains where a black box isn't a shrug - it's a compliance problem. Lending. Medicine. Drug discovery. Places where someone eventually has to answer, out loud, why the model did that.

Audit

Trace the output

Follow any generated chunk back to the training data and concepts that produced it. Provenance you can put in front of a regulator.

Steer

Turn a concept up or down

Amplify or suppress specific concepts at inference time - inference-time alignment without a retraining cycle.

Debug

Find where it went wrong

Inspect known and discovered concepts to locate errors, spurious signals, and bias instead of guessing at a black box.

Build

Start from open weights

Steerling-8B ships under Apache 2.0. Read the code, run the model, look inside the concept layer yourself.

The Founders

Researchers who wouldn't let a footnote go

The team carries an academic pedigree - PhDs from MIT, MILA, and the University of Maryland, with, by their own count, more than 20 years of combined interpretable-ML experience and two dozen-plus papers at top ML conferences. The founding thread runs through Julius Adebayo, who while at MIT co-authored a widely cited 2018 paper arguing that AI's popular explanation methods weren't actually reliable. Most people file a result like that under "known limitations." He filed it under unfinished business.

Co-founder & CEO

Julius Adebayo

ML researcher, MIT PhD. Co-author of the influential 2018 saliency-map critique. Set out to build models interpretable by design rather than explained after the fact.

Co-founder

Fulton Wang

A veteran of the interpretable-ML research community, part of the founding team behind Guide Labs' approach.

Chief Science Officer

Aya Abdelsalam Ismail

Leads the science behind Guide Labs' interpretable foundation models, including the concept-decomposition work in Steerling-8B.

Funding & Backers

$9M to make AI show its work

In 2024 Guide Labs closed a $9M seed round led by Initialized Capital to advance large-scale interpretable language models. The company came up through Y Combinator's Winter 2024 batch, and the cap table reads like a bet on interpretability as a business, not just a research agenda.

Initialized Capital (lead) Tectonic Ventures Y Combinator Lombardstreet Ventures E14 Fund Pioneer Fund Jonathan Frankle Kulveer Taggar + angels

2023

Founded

W24

Y Combinator

Headquarters

B2B

Model

Apache 2.0

Steerling license

Latest Updates

The short, fast paper trail

Feb 2026

Open-sourced Steerling-8B - an interpretable 8B LLM that traces every token to context, training data, and human-understandable concepts. Weights and code under Apache 2.0.

Feb 2026

Published research on aligning and controlling Steerling-8B via concept-level steering - alignment without retraining.

Nov 2024

Closed a $9M seed round led by Initialized Capital to build large-scale interpretable language models.

Winter 2024

Participated in Y Combinator's Winter 2024 batch.