Vicki Boykis

Vicki
Boykis

The engineer who writes. The writer who ships.

ML Engineering Embeddings Normcore Tech Rec Systems Open Source

Founding ML engineer, newsletter author, conference organizer, and one of the most honest voices in applied machine learning. She builds systems that work in production - and writes about why most don't.

The Engineer Who Actually Explains Things

Right now, Vicki Boykis is doing what she does best: building recommendation systems at an early-stage startup while writing about the messy, nondeterministic, occasionally infuriating reality of modern ML. She is a founding engineer at a Philadelphia-area startup working on personalization and information retrieval - the kind of work that keeps the internet pointing you at things you might actually want, rather than things an algorithm decided you should want.

She came to this through a route that would raise eyebrows in certain circles and earn knowing nods in others. Economics degree from Penn State, MBA from Temple, a side trip through community college CS courses in 2016 when she decided that data work without real programming chops was just guesswork in a spreadsheet. No CS degree. No machine learning master's. Just the economist's disposition toward skepticism and the programmer's compulsion to actually test the hypothesis.

What distinguishes Boykis in a field crowded with people writing about AI is that she writes about what the work actually is. Not what it aspires to be. Not what the press release says. The Normcore Tech newsletter - running on Substack, read by engineers, managers, and skeptical observers across the industry - applies what she calls "humanism, nuance, context, rationality, and a little fun" to the practice of data and ML. The BuzzFeed method, she calls it: hit them with the memes, then sneak in the serious content.

My strategy is a little bit like BuzzFeed - hit them with the memes and then sneak in serious content in between.

- Vicki Boykis

The serious content landed. In 2023, she published "What Are Embeddings?" - a 70-plus page technical deep dive that became a reference document across the ML community. Not a blog post. Not a tweet thread. A proper, footnoted, thoroughly explained account of what embeddings are, how they work, and why understanding them matters for anyone building systems with language models. It spread on Hacker News. It spread through Slack channels. Engineers sent it to colleagues with the message: "this is the one."

At Automattic and Tumblr between 2020 and 2022, she worked on recommendation systems for WordPress.com and Tumblr - the kind of work that shapes what content millions of people see without ever knowing a system made that choice. Experimental features included surfacing a ten-minute video of a potato in a microwave as a recommendation. The job of a recommendation engineer is partly to prevent the potato video from being the first thing a new user sees. Also partly to acknowledge that some users want exactly that.

After Automattic, she went to Mozilla.ai, where she led the design and open-sourcing of Lumigator - a self-hosted Python application for evaluating large language models using offline metrics. The premise: if you're going to deploy an LLM, you should be able to compare it against alternatives using real metrics on real tasks, not vibes. Lumigator starts with summarization and grows from there. It is the kind of tool that reflects Boykis's orientation toward production reality over benchmark theater.

I'm not looking to trick you. I'm looking to have a conversation with you and to see if I can work with you, and that's it.

- Vicki Boykis, on engineering interviews

The conference she organized in 2022 - Normconf, held December 15, fifteen straight hours of virtual talks - captured something real about the state of the field. The explicit agenda was "all the stuff that matters in data and machine learning but doesn't get the spotlight." No keynote about the next frontier. No breathless announcements. Talks about the things practitioners actually wrestle with: cleaning data, writing documentation, communicating with stakeholders, not breaking production on a Friday. Fifteen hours of it. People watched all of it.

In 2023, she built Viberary - a semantic search engine for book recommendations that works differently from any recommendation system you've used. You don't search by genre or author or title. You search by vibe. "Rainy afternoon melancholy with a surprising ending." "Dense prose that rewards rereading." Sentence Transformers, the MSMarco model, and a genuine conviction that the aesthetic texture of a book is a legitimate search query. The project also became a public teaching object: Boykis documented the architecture, the decisions, the tradeoffs, and the things that didn't work.

Her blog at vickiboykis.com operates as what she calls a "machine learning garden" - a growing, annotated map of her own learning. Annual retrospectives document what she built, read, learned, and changed her mind about. In February 2026 she published a piece on querying three billion vectors. These are not think pieces about the future of AI. They are detailed, personal accounts of working in a technical field and trying to understand it a little better each year.

The GitHub username is veekaybee - a phonetic rendering of her initials, V.B. The move from pyenv to uv made it into a blog post. The experience of learning nondeterministic LLM behavior in 2024, getting acquainted with Ray and vLLM and Llamafile, understanding GGUF format and FastAPI and OpenAPI-compatible APIs - all documented, publicly, because the point is not to appear to have always known. The point is to actually learn, and to make the learning legible to whoever comes next.

She has keynoted PyCon Italia in Florence, presented at PyData Amsterdam, and co-authored a widely-read piece on what ML engineering actually is with Gergely Orosz in The Pragmatic Engineer newsletter. She has published in Increment Magazine. She wrangles a kindergartner and a toddler in her off hours. She once taught herself Hebrew over a summer because she was embarrassed about not understanding the language on a visit to Israel. She planned to go into international relations. She ended up querying three billion vectors.

The field needs more people who can do the work and explain what they did. Boykis is one of the few who genuinely does both - with the economist's instinct to question the model, the engineer's discipline to test it, and the writer's insistence on saying something true.

machine-learning ml-engineering embeddings recommendation-systems llms python normcore-tech newsletter open-source information-retrieval mlops semantic-search

Key Projects

What Are Embeddings?

Technical Reference

A 70+ page deep dive into embeddings - from fundamentals to production use cases - that became one of the most-shared ML references of 2023. Free and open for anyone to read.

Viberary

Semantic Search

A book recommendation engine that searches by aesthetic vibe rather than genre or keyword. Uses Sentence Transformers and the MSMarco model. Built and documented publicly as a learning object.

Lumigator

LLM Evaluation

A self-hosted, open-source Python application for evaluating and comparing large language models using offline metrics. Built at Mozilla.ai. Starts with summarization use cases.

Normconf

Community Event

An unconventional 15-hour virtual ML/data conference held December 15, 2022. Focused on the unglamorous, practical, essential work that data conferences usually skip.

Normcore Tech

Newsletter

A Substack newsletter covering ML engineering and data culture with dry wit and genuine rigor. Humanism, nuance, context, rationality, and a little fun - since before "AI" was a marketing term.

vickiboykis.com

ML Garden

A public learning log and technical blog. Annual retrospectives, production ML insights, deep dives on embeddings, vectors, inference, and the messy reality of shipping models.

In Her Own Words

My strategy is a little bit like BuzzFeed - hit them with the memes and then sneak in serious content in between.

I'm not looking to trick you. I'm looking to have a conversation with you and to see if I can work with you, and that's it.

Python is as versatile as a Swiss Army knife and it's possible to teach yourself coding and data science.

Twitter was a place where she made friends, learned to be a serious programmer, worked out her best ideas, got leads for jobs, started a newsletter, and most importantly, had fun.

Field Dispatches

Taught herself Hebrew the summer before college after feeling embarrassed about not understanding the language during visits to Israel. A pattern: identify a gap, fill it yourself, don't wait for a course.

Planned to go into international relations. Ended up querying three billion vectors. The economics degree was a compromise between English and statistics - a compromise that, in hindsight, turned out to be exactly right.

Took community college CS courses in 2016 with no prior CS background. No master's program, no bootcamp - just the coursework, plus the practice, plus the internet. Now she's a founding engineer.

At Tumblr, helped surface an experimental recommendation: a ten-minute video of a potato in a microwave. The job of a recommendation engineer is to understand why users want things - including, occasionally, the potato.

Switched from pyenv to uv for Python environment management in 2024. Wrote about it. Blogging the small decisions alongside the large ones is the whole methodology.

Spent 2024 inside the nondeterministic properties of LLMs - learning Ray, vLLM, Llamafile, GGUF format, FastAPI, OpenAPI-compatible APIs. Published the retrospective in January 2025 so others could skip the guessing.

Timeline

2012-2015

MBA at Temple University - building the business context for technical work

2015-2017

Senior Business Intelligence Analyst at Comcast - analyzing data to drive product decisions

2016

Community college CS coursework - self-directed, practical, no degree required

2017-2019

Senior ML Engineer at Duo Security - deepening ML and recommendation systems expertise

2020-2022

ML Engineer at Automattic/Tumblr - building recommendation systems for WordPress.com and Tumblr, surfacing content to millions

2022

Founded and organized Normconf - 15 hours of practical ML talks on December 15

2023

Published "What Are Embeddings?" (70+ pages) and launched Viberary, the vibe-based book search engine

2023-2024

Senior ML Engineer at Mozilla.ai - led design and open-sourcing of Lumigator, an LLM evaluation framework

2024-now

Founding ML Engineer at an early-stage startup - recommendation systems, personalization, information retrieval

What Makes Her Tick

🔬

Rigorous Skeptic

Tests assumptions. Checks the model. Doesn't trust benchmarks that don't reflect production.

✍️

Systems Explainer

The best writing about ML systems comes from people who build them. She does both.

🎭

Dry Wit

Memes first. Serious content second. Potato microwave video in the recommendations.

🌱

Self-Taught

No CS degree. Community college in 2016. Learning publicly since then, every year documented.

🏗️

Practical Builder

Builds real things: Viberary, Lumigator, Normconf, 70-page papers. Production over theory.

🤝

Community First

Organized a 15-hour conference because the community needed one. Open-sources the work.

📚

Economist's Lens

Sees systems as systems. Costs, incentives, tradeoffs - not just the technical spec.

💡

Humanist

Nuance, context, and the human dimension of data work. Not just what the model outputs.

VickiBoykis