HAMEL HUSAIN - PARLANCE LABS / ML ENGINEER
The man who helped invent GitHub Copilot - now teaching the world to stop guessing about AI.
Independent ML consultant. Founder of Parlance Labs. Co-author of O'Reilly's "Evals for AI Engineers." Before all that: GitHub, Airbnb, a brief detour through law school, and one of the most-shared papers in AI engineering history.
Most people who spend a decade in machine learning drift toward abstraction - toward conferences, keynotes, and the slow academic orbit. Hamel Husain went the other way. He spent 25+ years getting closer to the problem, not farther from it. Credit risk models at a bank. Management consulting across industries. A stint in law school that gave him a prose style most engineers never develop. Then DataRobot, Airbnb, and GitHub - each step adding a different layer of how messy real-world ML actually is.
At GitHub, his team built CodeSearchNet - a semantic code search system that became foundational research OpenAI used for code understanding. It was the geological layer beneath what became GitHub Copilot. The product now used by millions of developers. Hamel was there before the brand existed, doing the unglamorous work of figuring out how language models understand code.
After GitHub, he did not join a hot startup or take a staff role at a FAANG. He founded Parlance Labs and went independent - advising companies that had built AI prototypes but couldn't figure out why they stopped working after the demo. It turns out that's most companies. The gap between "works in a notebook" and "works reliably for users" is where careers end. Hamel made it his business to bridge that gap.
Keep it simple. Start simple, and every time you have to increase sophistication, you have to explain why.
- Hamel HusainThe core of his message is almost subversive in an industry obsessed with benchmarks: most teams don't know if their AI product actually works. They feel good when a demo goes well. They feel bad when a customer complains. In between, there's nothing - no measurement, no systematic analysis, no feedback loop. What Hamel calls "vibe checks" masquerading as evaluation.
He has spent years building the antidote: practical, field-tested methods for evaluating LLM systems - not academic abstractions, but specific tools and processes for specific people shipping real products. His co-authored O'Reilly book "Evals for AI Engineers" codifies everything he learned advising those 35+ products. It covers error analysis, LLM-as-a-judge, synthetic data, production monitoring, and how to build data flywheels that actually drive iteration.
There is also a rare biographical wrinkle that explains his unusual clarity: he went to law school. He enrolled in 2008, left in 2010 when he realized engineering was the only thing he wanted to do, but came away with something most ML engineers never acquire - the ability to construct and test an argument on paper. His writing reflects it. Clear, precise, structured. No hedging.
Hamel's career reads like a map of every domain adjacent to machine learning. He started building credit risk models at a bank in 2003, pivoted to management consulting to learn how organizations actually make decisions, spent two years in law school (left voluntarily - engineering called louder), then rejoined tech through DataRobot alongside Kaggle grandmasters before landing at Airbnb and GitHub. Each phase added something the others couldn't. The consulting gave him business framing. The law sharpened his writing. The startups gave him urgency. The big tech gave him scale.
In 2019, Hamel's team at GitHub published the CodeSearchNet Challenge - a benchmark for semantic code search using large language models. The paper established how transformers could understand code at scale. OpenAI adopted the research. GitHub built on the foundation. The result was GitHub Copilot, which launched in 2021 and has since become one of the most widely-used AI coding assistants in the world. Hamel was there at the geology - the invisible layer underneath the product that users never see.
There is a moment in every AI product's life where the team realizes the demo is not the product. The demo works because someone picked the prompts carefully. The product has to work for everyone, always, even when users do weird things. That moment - the "why is it worse now?" moment - is where most AI teams find themselves completely unprepared.
Hamel's singular contribution to the field is a practical answer to that question. Not a framework. Not an abstraction. A systematic approach: define what good looks like, collect examples where it fails, build a judge that can tell the difference, and iterate until the numbers move. Do this across 35+ products and you stop being surprised. You stop guessing. You start engineering.
After co-publishing "What We Learned from a Year of Building with LLMs" in 2024 - one of the most widely read pieces O'Reilly has run on practical LLM engineering - Hamel began getting approached by teams who had the same story: "We showed it to executives and it worked perfectly. Three weeks later customers started complaining and we don't know why." The pattern was universal. The root cause was always the same: no measurement. The piece became a catalyst for an entire movement toward rigorous AI evaluation.
Hamel's worldview on AI engineering can be summarized in one sentence: if you're not measuring it, you're guessing. In an industry that has elevated demo culture to an art form, this is practically a provocation. He is not against vision or ambition - he is against the professional habit of mistaking impressive outputs for reliable systems.
He is equally skeptical of unnecessary complexity. His principle - "start simple, and every time you increase sophistication, you have to explain why" - sounds obvious until you watch most AI teams sprint immediately to fine-tuning when a better prompt would have worked. Hamel has seen this pattern across dozens of products. The solution is almost always boring. The measurement infrastructure is almost always absent.
He also believes ML is a team sport in a way most engineers resist hearing. Great ML products need data engineers, DevOps, infrastructure, design, and UX working together. The lone data scientist running everything is a fantasy that produces mediocre systems. Real production ML requires organizational muscle as much as technical skill.
"Machine learning often isn't right because organizational maturity isn't developed enough to derive value."
"Applying ML is very much a team sport - it requires data engineers, DevOps, infrastructure, design, and UX expertise."
"I try to figure out how to avoid as many meetings as possible... learn something new every day."
"Evals are how you know if your AI product actually works - not vibes, not demos, not gut feelings."
Alongside his GitHub work, Hamel became a core contributor to the fastai ecosystem alongside Jeremy Howard - the man who arguably did more than anyone to democratize deep learning. The partnership shaped Hamel's entire approach to software engineering. Howard's philosophy - that tools should lower barriers, that opinionated design is a kindness, that the best abstraction is one that teaches you something - runs through everything Hamel has built since.
He contributed to fastcore, helped establish CI/CD practices for fastai projects, and created fastpages, an easy blogging platform with first-class Jupyter Notebook support. When most of the ML world was still treating notebooks as toys, Hamel was building infrastructure to treat them as first-class artifacts. nbdev, another project he championed, took that idea further - using notebooks as the source of truth for code, documentation, and tests simultaneously.
These are not flashy products. They are the kind of infrastructure that only matters if you care about the experience of the person doing the work. Hamel cares about that.
Parlance Labs is not a large firm. It is Hamel and a small team of practitioners - including Shreya Shankar, a researcher whose work on human-computer interaction for LLM workflows has been adopted by tools like LangSmith, and John Berryman, a search veteran who co-authored "Relevant Search" and worked on GitHub Copilot. The firm's thesis is specific: companies that have moved past early prototypes and need repeatable ways to measure, diagnose, and improve real-world AI performance.
The typical client is not starting from scratch. They have a product. It works sometimes. They cannot explain why it fails, cannot reproduce failures systematically, and have no mechanism for knowing whether a change made things better or worse. Parlance Labs gives them that mechanism. After 35+ engagements, the patterns are well-mapped. The interventions are known. The work is surgical.
Hamel's consulting email - hamel@parlance-labs.com - is public. The bar is high: he works with teams that have already built something and need to make it reliable, not teams looking for someone to build the prototype. The distinction matters to him.
When GitHub Copilot launched in 2021 and became the defining AI coding tool of the decade, Hamel was not on the Copilot team. He had done the earlier work - the research layer that answered the question "can language models understand code well enough to be useful?" His CodeSearchNet challenge proved they could, in 2019. Two years later, OpenAI and GitHub built on that foundation. Most engineers who shaped a product that significant would be making sure everyone knew. Hamel mostly talked about evals.
In 2008, Hamel enrolled in law school mid-career. He left in 2010. Not because he failed - because engineering won. But the two years left a mark. If you read his blog posts or his O'Reilly pieces carefully, you notice something: they are structured like briefs. Claim, evidence, implication. He does not bury the lead. He does not use weasel words. He says what he means and explains why. For an ML field that often reads like it was written by a committee trying not to commit to anything, this is distinctive.
Working with Jeremy Howard on fastai was not just a technical collaboration - it changed how Hamel thinks about software. Howard's position is that good software should teach you something by the way it's designed, that barriers to entry are a design failure, and that documentation is not separate from code but the same thing. Hamel absorbed all of this. His tool design choices, his insistence on practical examples over theory, his preference for working code over white papers - all of it traces back to that collaboration.
Hamel's goal is not to become the loudest voice in AI discourse. He does not appear to be building a personal brand in the conventional sense. What he is doing - methodically, relentlessly - is raising the floor. He wants a world where every engineer and PM building AI products has access to the tools and knowledge to measure whether they are actually working. Where "it seemed fine in the demo" is not considered a valid sign-off. Where the gap between AI prototype and AI product has a known map.
His newsletter, his courses, his O'Reilly books, his open-source tools - they are all aimed at the same problem from different angles. The long arc of Hamel Husain's career bends toward rigor. He is not trying to make AI more hyped. He is trying to make it more trustworthy.
That is, as missions go, considerably harder than building the next demo. And considerably more useful.
AI is easy to demo and hard to trust. The hard part is the part that matters.
- Hamel Husain, distilled