Byung-Gon Chun

Profile

The Researcher Who Rewrote the Rules of LLM Inference

In 2022, a paper landed at OSDI - one of computing's most selective systems conferences. It was called ORCA: A Distributed Serving System for Transformer-Based Generative Models. It introduced a technique called continuous batching. Within two years, every major LLM inference engine on the planet - vLLM, TensorRT-LLM, Hugging Face Text Generation Inference - had adopted some version of it. The paper came from the Seoul National University lab of a professor named Byung-Gon Chun. Most people in the industry know the technique. Far fewer know the name behind it.

Call him Gon. That's what colleagues do. The full name - Byung-Gon - carries the weight of a career built across four institutions and three continents before FriendliAI ever existed. Berkeley for the PhD in 2007. Intel Research Berkeley for the postdoc years. Yahoo Research and Microsoft Silicon Valley for the industry stints. A visiting post at Facebook in Menlo Park in 2016. Then back to Korea, to Seoul National University, to run a lab and teach operating systems. And then, from that lab, the paper that changed everything.

The greatest challenge lies in scaling workloads from prototypes to production, where costs, latency, and GPU management complexity often stall deployment.

- Byung-Gon Chun, at The AI Conference

The insight behind continuous batching is deceptively simple. Traditional LLM serving treated each request like a discrete job: wait for it to finish before starting the next. This left GPUs sitting idle between requests, wasting compute on a resource that costs a fortune per hour. Continuous batching inserts new requests into the processing pipeline mid-stream, filling those gaps dynamically. In some workloads, the throughput improvement exceeds 10x. It sounds obvious in retrospect. Most important ideas do.

What Chun recognized next was the gap between the idea and the reality. The concept was out there. Implementing it reliably at enterprise scale - with the scheduling, caching, fault tolerance, and monitoring that actual production demands - was a completely different problem. That gap is what FriendliAI was built to close.

Continuous Batching vs. Static Batching - GPU Utilization

Static Batching

Req A+B Active

GPU Idle (waiting)

Continuous Batching

Req A+B Active

Req C+D injected mid-stream

GPU utilization improvement: up to 10x throughput in high-concurrency workloads. Source: ORCA paper (OSDI 2022), FriendliAI research.

FriendliAI was founded in 2021, still technically running out of SNU, with an initial $6M seed from Capstone Partners. The company's early product was called PeriFlow - a name that pointed at the fluid, continuous nature of the scheduling approach underneath. By 2023, the company had moved its headquarters to California (first Redwood City, then San Francisco), rebranded to FriendliAI, and was doing real business with real enterprise clients. The name shift was deliberate: friendli as in approachable, low-friction, enterprise-safe.

The traction followed the credibility. ScatterLab - makers of the Luda Lee 2.0 conversational AI that topped Korean app store rankings - used FriendliAI's platform to deploy a model 17 times larger at comparable speeds and costs. LG Electronics became a client. Upstage, one of Korea's most respected AI labs, signed on. By mid-2025, FriendliAI had 25 to 30 large enterprise customers and revenue on track for 6 to 7 times its 2024 level. In August 2025, the company closed a $20M seed extension led again by Capstone, with Sierra Ventures, Alumni Ventures, KDB, and KB Securities joining the round.

80% to 90% of GPUs are dedicated to inference, and only the remainder used for training.

- Byung-Gon Chun, CEO, FriendliAI

That statistic matters because it reframes where the real compute costs in AI actually sit. Training grabs the headlines - $100M+ runs for frontier models, data center buildouts measured in gigawatts. But inference is where the bills land, every day, at every scale. Every chatbot query, every coding assistant suggestion, every enterprise search result - that's inference. And Chun's bet is that the company with the best inference engine wins that market.

FriendliAI's current platform covers an almost absurd range of models - 550,000 text, vision, audio, and multimodal models deployable directly from Hugging Face in one click. The engine claims 2x or better inference speed versus standard deployments, with enterprise-grade compliance: SOC 2 Type II, HIPAA certified. The pitch to enterprises is not just speed - it's the removal of the operational complexity that keeps AI pilots from becoming production deployments. GPU provisioning, auto-scaling, failure recovery, performance monitoring - all managed.

The academic pedigree that grounds the whole operation is formidable. Chun's research during his SNU years collected awards from every major AI lab - Google, Microsoft, Amazon, Facebook. He received the Microsoft Research Faculty Fellowship in 2014, the first Asian researcher based in Asia to get it. The ACM SIGOPS Hall of Fame Award followed in 2020. The EuroSys Test of Time Award in 2021. Earlier, his collaboration with Microsoft produced REEF - the Retainable Evaluator Execution Framework - which the Apache Software Foundation elevated to Top Level Project status, a recognition typically reserved for mature, production-critical infrastructure.

The man collects milestones the way other academics collect citations. But what distinguishes Chun is the rare combination of credibility in both directions: rigorous enough for OSDI, practical enough for LG Electronics. A researcher who ships. A CEO who can still read a systems paper and immediately see where the math breaks down in production. FriendliAI is the institutional expression of that combination.

The ambition is stated plainly: make inference a commodity, the way compute storage became a commodity. Not a black box that requires a team of PhD engineers to operate, but a utility - reliable, measurable, cheap enough to make the GPU tax irrelevant. The $250B AI inference market that Sierra Ventures projects for 2030 is the backdrop. Chun's bet is that the company with the deepest technical foundation and the broadest model coverage wins the enterprise slice of it.

He remains an Associate Professor at Seoul National University - on leave, technically, though FriendliAI has long since consumed the foreground. The lab that produced continuous batching still exists. The insight that powered it is now running inside virtually every serious LLM deployment in the world. And the CEO of the company that turned it into a business still answers to Gon.

At a Glance

CEO & Co-founder, FriendliAI (2021-present)
Professor, CSE, Seoul National University (on leave)
Based in San Francisco, California
PhD in Computer Science, UC Berkeley (2007)
Inventor of Continuous Batching (ORCA, OSDI 2022)
Research: Intel, Yahoo!, Microsoft, Facebook

Education

B.S. + M.S. Electronic Engineering - Seoul National University
M.S. Computer Science - Stanford University
PhD Computer Science - UC Berkeley (2007)

Awards & Honors

ACM SIGOPS Hall of Fame Award (2020)
EuroSys Test of Time Award (2021)
Microsoft Research Faculty Fellowship (2014) - first Asian recipient
Google Research Award (2020)
Amazon ML Research Award (2018)
Facebook Caffe2 Research Award (2017)
Apache Top Level Project: REEF (with Microsoft)

FriendliAI Fast Facts

Founded: 2021, HQ: San Francisco, CA
Total raised: $26.7M (seed + extension)
Lead investor: Capstone Partners
550,000+ models from Hugging Face
25-30 enterprise clients incl. LG Electronics
SOC 2 Type II + HIPAA certified
6-7x revenue growth projected (2025 vs 2024)

Key Quote

"The AI inference market is exploding as more organizations move from AI experimentation to production deployment."

- Byung-Gon Chun, August 2025

Career Arc

From Berkeley to the Bridge Between Research and Reality

2007-2008

Postdoctoral Researcher - International Computer Science Institute, Berkeley. The bridge between PhD and industry.

2008-2011

Research Scientist - Intel Research Berkeley. Key projects: CloneCloud (mobile computation offloading) and TaintDroid (privacy monitoring for Android).

2011-2012

Research Scientist - Yahoo! Research Silicon Valley. Led Mobius project and workload-driven big data systems.

2012-2013

Principal Scientist - Microsoft Silicon Valley. Co-developed REEF big-data meta-analysis framework, later adopted by Apache Software Foundation.

2013

Joined Seoul National University as Associate Professor of CSE and Associate Director of SNU AI Institute.

2014

Microsoft Research Faculty Fellowship - first Asian researcher based in Asia to receive this prestigious award.

2016-2017

Visiting Research Scientist - Facebook, Menlo Park. Industry re-immersion ahead of the ML systems wave.

2020

ACM SIGOPS Hall of Fame Award. Visiting Researcher at Naver. The recognition wave arrives.

2021

Founded FriendliAI (initially PeriFlow) from SNU lab. $6M seed round led by Capstone Partners.

2022

Published ORCA paper at OSDI 2022 - introducing continuous batching to the world. The paper that everything else is built on.

2023

FriendliAI moves headquarters to California. PeriFlow rebrands. Enterprise sales kick in.

2025

$20M seed extension. 550,000+ models. 6-7x revenue growth. Samsung Cloud Platform partnership. The inflection point.

Education

Four Degrees. Three Countries. One Thread.

1994

B.S. Electronic Engineering

Seoul National University

1996

M.S. Electronic Engineering

Seoul National University

2002

M.S. Computer Science

Stanford University

2007

PhD Computer Science

UC Berkeley

20+

Years in AI Systems Research

Major Tech Companies (Intel, Yahoo, MSFT, FB)

Major Research Awards

FriendliAI

The Infrastructure Layer the AI Boom Needs

Every AI application eventually hits the same wall. The model works in the prototype. The demo runs fine on a single GPU. Then it goes to production, and suddenly there are 10,000 concurrent users, inference costs 3x the model cost, latency spikes unpredictably, and the engineering team is spending more time managing GPU clusters than building features. That wall is what FriendliAI was built to knock down.

The platform operates on three layers. The inference engine itself - built on the ORCA continuous batching foundation, extended with speculative decoding, custom GPU kernels, and smart caching. A deployment layer that handles GPU provisioning, auto-scaling, failure recovery, and load balancing without requiring the customer to touch a YAML file. And a model coverage layer that connects directly to Hugging Face, making 550,000+ models deployable in a single click.

The enterprise pitch is concrete: up to 90% GPU cost reduction, industry-leading inference speeds, SOC 2 Type II and HIPAA compliance, 99.99% uptime SLA. These numbers are not marketing - they emerge from the same optimization techniques that Chun's lab spent years developing. When a customer like ScatterLab deploys a 17x larger model at comparable cost and speed, that is continuous batching, speculative decoding, and kernel optimization working in concert.

FriendliAI is not yet profitable. Chun says so directly. The focus has been on scaling efficiently while maintaining strong gross margins. The $20M seed extension goes toward North American and Asian go-to-market expansion, software development, and GPU procurement for the cloud service. The newest initiative, InferenceSense, aims to monetize idle enterprise GPUs by routing inference workloads to them - turning the compute slack in corporate data centers into revenue.

Latest Updates

Aug 2025: $20M seed extension closed, Capstone Partners leading
May 2025: Sang Won Lee (former Qeexo CEO) joins FriendliAI
2025: Samsung Cloud Platform partnership for NVIDIA B300 GPU inference
2025: LG AI Research API access to EXAONE 4.0
2025: Support for GLM-5.1, Gemma-4-31B, NVIDIA Nemotron
2024: InferenceSense launched to monetize idle enterprise GPUs

Fun Facts

Colleagues call him "Gon" - a shorthand for Byung-Gon
Holds 4 degrees from SNU, Stanford, and UC Berkeley
The ORCA paper is effectively the origin document for modern LLM serving
Research awards from every major AI lab: Google, Microsoft, Amazon, Facebook
First Asian researcher based in Asia to receive the MSFT Research Faculty Fellowship

Byung-Gon
Chun

The Researcher Who Rewrote the Rules of LLM Inference

At a Glance

Education

Awards & Honors

FriendliAI Fast Facts

Key Quote

From Berkeley to the Bridge Between Research and Reality

Four Degrees. Three Countries. One Thread.

The Awards That Follow Research That Actually Matters

ACM SIGOPS Hall of Fame - 2020

EuroSys Test of Time Award - 2021

Microsoft Research Faculty Fellowship - 2014

Apache Top Level Project - REEF

Google, Amazon, Facebook Research Awards

ORCA Paper - OSDI 2022

The Infrastructure Layer the AI Boom Needs

Latest Updates

Fun Facts

Links & Resources

Byung-GonChun

The Researcher Who Rewrote the Rules of LLM Inference

At a Glance

Education

Awards & Honors

FriendliAI Fast Facts

Key Quote

From Berkeley to the Bridge Between Research and Reality

Four Degrees. Three Countries. One Thread.

The Awards That Follow Research That Actually Matters

ACM SIGOPS Hall of Fame - 2020

EuroSys Test of Time Award - 2021

Microsoft Research Faculty Fellowship - 2014

Apache Top Level Project - REEF

Google, Amazon, Facebook Research Awards

ORCA Paper - OSDI 2022

The Infrastructure Layer the AI Boom Needs

Latest Updates

Fun Facts

Links & Resources

Byung-Gon
Chun