BREAKING
Hamel Husain ML Engineer & Founder of Parlance Labs Co-author of "Evals for AI Engineers" (O'Reilly 2025) 25+ years building machine learning systems His CodeSearchNet work helped birth GitHub Copilot Taught 3,000+ engineers how to actually measure AI Writes at hamel.dev @hamelhusain on X Hamel Husain ML Engineer & Founder of Parlance Labs Co-author of "Evals for AI Engineers" (O'Reilly 2025) 25+ years building machine learning systems His CodeSearchNet work helped birth GitHub Copilot Taught 3,000+ engineers how to actually measure AI Writes at hamel.dev @hamelhusain on X
Hamel Husain - ML Engineer and Founder of Parlance Labs

HAMEL HUSAIN - PARLANCE LABS / ML ENGINEER

ML ENGINEER & AI EVAL PIONEER

Hamel
Husain

The man who helped invent GitHub Copilot - now teaching the world to stop guessing about AI.

Independent ML consultant. Founder of Parlance Labs. Co-author of O'Reilly's "Evals for AI Engineers." Before all that: GitHub, Airbnb, a brief detour through law school, and one of the most-shared papers in AI engineering history.

Founder Engineer Author Educator Open Source
Field-Tested
25+
Years in ML
35+
AI Products Advised
3K+
Course Students
5.3K
nbdev Stars
01

The Engineer Who Wouldn't Stay Put

Profile

Most people who spend a decade in machine learning drift toward abstraction - toward conferences, keynotes, and the slow academic orbit. Hamel Husain went the other way. He spent 25+ years getting closer to the problem, not farther from it. Credit risk models at a bank. Management consulting across industries. A stint in law school that gave him a prose style most engineers never develop. Then DataRobot, Airbnb, and GitHub - each step adding a different layer of how messy real-world ML actually is.

At GitHub, his team built CodeSearchNet - a semantic code search system that became foundational research OpenAI used for code understanding. It was the geological layer beneath what became GitHub Copilot. The product now used by millions of developers. Hamel was there before the brand existed, doing the unglamorous work of figuring out how language models understand code.

After GitHub, he did not join a hot startup or take a staff role at a FAANG. He founded Parlance Labs and went independent - advising companies that had built AI prototypes but couldn't figure out why they stopped working after the demo. It turns out that's most companies. The gap between "works in a notebook" and "works reliably for users" is where careers end. Hamel made it his business to bridge that gap.

Keep it simple. Start simple, and every time you have to increase sophistication, you have to explain why.

- Hamel Husain

The core of his message is almost subversive in an industry obsessed with benchmarks: most teams don't know if their AI product actually works. They feel good when a demo goes well. They feel bad when a customer complains. In between, there's nothing - no measurement, no systematic analysis, no feedback loop. What Hamel calls "vibe checks" masquerading as evaluation.

He has spent years building the antidote: practical, field-tested methods for evaluating LLM systems - not academic abstractions, but specific tools and processes for specific people shipping real products. His co-authored O'Reilly book "Evals for AI Engineers" codifies everything he learned advising those 35+ products. It covers error analysis, LLM-as-a-judge, synthetic data, production monitoring, and how to build data flywheels that actually drive iteration.

There is also a rare biographical wrinkle that explains his unusual clarity: he went to law school. He enrolled in 2008, left in 2010 when he realized engineering was the only thing he wanted to do, but came away with something most ML engineers never acquire - the ability to construct and test an argument on paper. His writing reflects it. Clear, precise, structured. No hedging.

The Copilot Connection
Building What Became Copilot

In 2019, Hamel's team at GitHub published the CodeSearchNet Challenge - a benchmark for semantic code search using large language models. The paper established how transformers could understand code at scale. OpenAI adopted the research. GitHub built on the foundation. The result was GitHub Copilot, which launched in 2021 and has since become one of the most widely-used AI coding assistants in the world. Hamel was there at the geology - the invisible layer underneath the product that users never see.

02

Why Evals? Why Now?

Mission

There is a moment in every AI product's life where the team realizes the demo is not the product. The demo works because someone picked the prompts carefully. The product has to work for everyone, always, even when users do weird things. That moment - the "why is it worse now?" moment - is where most AI teams find themselves completely unprepared.

Hamel's singular contribution to the field is a practical answer to that question. Not a framework. Not an abstraction. A systematic approach: define what good looks like, collect examples where it fails, build a judge that can tell the difference, and iterate until the numbers move. Do this across 35+ products and you stop being surprised. You stop guessing. You start engineering.

Field Note

After co-publishing "What We Learned from a Year of Building with LLMs" in 2024 - one of the most widely read pieces O'Reilly has run on practical LLM engineering - Hamel began getting approached by teams who had the same story: "We showed it to executives and it worked perfectly. Three weeks later customers started complaining and we don't know why." The pattern was universal. The root cause was always the same: no measurement. The piece became a catalyst for an entire movement toward rigorous AI evaluation.

What He Teaches
Evals For Engineers
  • Error analysis that finds actual problems, not imaginary ones
  • LLM-as-a-judge - when it works, when it doesn't
  • Synthetic data generation for edge cases
  • Production monitoring that actually catches regressions
  • Building data flywheels that make models better over time
  • Selecting the right eval tooling for your stack
Open Source
Tools He Built
  • nbdev - literate programming with Jupyter (5,300+ stars)
  • fastpages - Jupyter-based blogging platform (3,500+ stars)
  • CodeSearchNet - code search benchmark (2,400+ stars)
  • ghapi - Python client for the GitHub API
  • fastcore - foundational fastai utilities
  • Inspect AI - LLM evals library contributions
03

Career Timeline

History
2003
Begins career in ML - credit risk models at a bank, straight out of college (Georgia Tech)
2005-08
Management consulting across multiple industries - learns how organizations actually make decisions under uncertainty
2008-10
Law school - left after two years, but kept the writing skills and the ability to argue from evidence
2014
Joins DataRobot in Boston - works alongside Kaggle grandmasters and open-source library maintainers
2016
Joins Airbnb as Data Scientist - contributes to the BigHead ML platform for model serving
2017-22
GitHub: builds CodeSearchNet (precursor to Copilot), integrates ML tools with GitHub Actions, contributes to fastai ecosystem
2022
Founds Parlance Labs - independent AI consulting focused on evaluation, observability, and real-world LLM performance
2025
Publishes "Evals for AI Engineers" with O'Reilly; "A Field Guide to Rapidly Improving AI Products" - two of the most practical LLM guides available
04

By The Numbers

Achievements
01
CodeSearchNet - Co-authored the LLM benchmark that became foundational research for GitHub Copilot's code understanding
02
3,000+ students from 500+ companies taught through Maven's "AI Evals for Engineers & PMs" course (with Shreya Shankar)
03
"Evals for AI Engineers" - O'Reilly book (2025) covering the entire evaluation lifecycle for production AI systems
04
nbdev - 5,300+ GitHub stars for his literate programming tool that makes Jupyter notebooks production-grade
05
"What We Learned from a Year of Building with LLMs" - co-authored the most-shared O'Reilly guide on practical LLM engineering (2024)
06
35+ AI products advised at Parlance Labs - developing a rare empirical view of what actually makes LLM systems improve in production
05

What He Actually Believes

Philosophy

Hamel's worldview on AI engineering can be summarized in one sentence: if you're not measuring it, you're guessing. In an industry that has elevated demo culture to an art form, this is practically a provocation. He is not against vision or ambition - he is against the professional habit of mistaking impressive outputs for reliable systems.

He is equally skeptical of unnecessary complexity. His principle - "start simple, and every time you increase sophistication, you have to explain why" - sounds obvious until you watch most AI teams sprint immediately to fine-tuning when a better prompt would have worked. Hamel has seen this pattern across dozens of products. The solution is almost always boring. The measurement infrastructure is almost always absent.

He also believes ML is a team sport in a way most engineers resist hearing. Great ML products need data engineers, DevOps, infrastructure, design, and UX working together. The lone data scientist running everything is a fantasy that produces mediocre systems. Real production ML requires organizational muscle as much as technical skill.

"Machine learning often isn't right because organizational maturity isn't developed enough to derive value."

"Applying ML is very much a team sport - it requires data engineers, DevOps, infrastructure, design, and UX expertise."

"I try to figure out how to avoid as many meetings as possible... learn something new every day."

"Evals are how you know if your AI product actually works - not vibes, not demos, not gut feelings."

06

The fastai Chapter

Open Source

Alongside his GitHub work, Hamel became a core contributor to the fastai ecosystem alongside Jeremy Howard - the man who arguably did more than anyone to democratize deep learning. The partnership shaped Hamel's entire approach to software engineering. Howard's philosophy - that tools should lower barriers, that opinionated design is a kindness, that the best abstraction is one that teaches you something - runs through everything Hamel has built since.

He contributed to fastcore, helped establish CI/CD practices for fastai projects, and created fastpages, an easy blogging platform with first-class Jupyter Notebook support. When most of the ML world was still treating notebooks as toys, Hamel was building infrastructure to treat them as first-class artifacts. nbdev, another project he championed, took that idea further - using notebooks as the source of truth for code, documentation, and tests simultaneously.

These are not flashy products. They are the kind of infrastructure that only matters if you care about the experience of the person doing the work. Hamel cares about that.

Personality
How He Works
  • Pragmatic - simplicity over sophistication unless you can argue otherwise
  • Anti-hype - calls out AI theater and demo-ware directly
  • Meeting-averse - values deep work, focused time
  • Voracious reader - crosses domains, connects ideas across fields
  • Open-source native - default mode is share everything
  • Team-oriented - explicitly anti-lone-genius narrative
07

Parlance Labs & The Consulting Thesis

Company

Parlance Labs is not a large firm. It is Hamel and a small team of practitioners - including Shreya Shankar, a researcher whose work on human-computer interaction for LLM workflows has been adopted by tools like LangSmith, and John Berryman, a search veteran who co-authored "Relevant Search" and worked on GitHub Copilot. The firm's thesis is specific: companies that have moved past early prototypes and need repeatable ways to measure, diagnose, and improve real-world AI performance.

The typical client is not starting from scratch. They have a product. It works sometimes. They cannot explain why it fails, cannot reproduce failures systematically, and have no mechanism for knowing whether a change made things better or worse. Parlance Labs gives them that mechanism. After 35+ engagements, the patterns are well-mapped. The interventions are known. The work is surgical.

Hamel's consulting email - hamel@parlance-labs.com - is public. The bar is high: he works with teams that have already built something and need to make it reliable, not teams looking for someone to build the prototype. The distinction matters to him.

Fast Facts
Law school dropout (by choice) - two years of legal training gave him prose clarity that stands out in a field full of jargon-heavy writing
hamelsmu has been his handle across GitHub, Medium, and social platforms since early in his career - rare consistency in a world of rebrands
CodeSearchNet repo has 2,400+ GitHub stars - for a research benchmark dataset, which is not a typical star magnet
Trained by Jeremy Howard - his work on fastai with Howard shaped his philosophy that good tooling is teaching, not just automation
hamel.dev runs on Quarto - he uses the developer tooling he advocates for, which is rarer than it should be in the "eat your own cooking" department
2,500+ GitHub followers despite being primarily known for writing and teaching, not shipping consumer products with viral growth loops
08

Stories Worth Telling

Anecdotes
The Copilot Layer

When GitHub Copilot launched in 2021 and became the defining AI coding tool of the decade, Hamel was not on the Copilot team. He had done the earlier work - the research layer that answered the question "can language models understand code well enough to be useful?" His CodeSearchNet challenge proved they could, in 2019. Two years later, OpenAI and GitHub built on that foundation. Most engineers who shaped a product that significant would be making sure everyone knew. Hamel mostly talked about evals.

The Law School Arc

In 2008, Hamel enrolled in law school mid-career. He left in 2010. Not because he failed - because engineering won. But the two years left a mark. If you read his blog posts or his O'Reilly pieces carefully, you notice something: they are structured like briefs. Claim, evidence, implication. He does not bury the lead. He does not use weasel words. He says what he means and explains why. For an ML field that often reads like it was written by a committee trying not to commit to anything, this is distinctive.

The Jeremy Howard Effect

Working with Jeremy Howard on fastai was not just a technical collaboration - it changed how Hamel thinks about software. Howard's position is that good software should teach you something by the way it's designed, that barriers to entry are a design failure, and that documentation is not separate from code but the same thing. Hamel absorbed all of this. His tool design choices, his insistence on practical examples over theory, his preference for working code over white papers - all of it traces back to that collaboration.

09

What He's Building Toward

Aspirations

Hamel's goal is not to become the loudest voice in AI discourse. He does not appear to be building a personal brand in the conventional sense. What he is doing - methodically, relentlessly - is raising the floor. He wants a world where every engineer and PM building AI products has access to the tools and knowledge to measure whether they are actually working. Where "it seemed fine in the demo" is not considered a valid sign-off. Where the gap between AI prototype and AI product has a known map.

His newsletter, his courses, his O'Reilly books, his open-source tools - they are all aimed at the same problem from different angles. The long arc of Hamel Husain's career bends toward rigor. He is not trying to make AI more hyped. He is trying to make it more trustworthy.

That is, as missions go, considerably harder than building the next demo. And considerably more useful.

AI is easy to demo and hard to trust. The hard part is the part that matters.

- Hamel Husain, distilled
Share This Profile LinkedIn Twitter / X Facebook Instagram