Blog: BLIS — Evolving llm-d at simulation speed by mtoslalibu · Pull Request #344 · llm-d/llm-d.github.io

mtoslalibu · 2026-06-05T20:04:34Z

Summary

Adds blog post introducing BLIS, the llm-d simulator
Covers AI-native evolution of llm-d policies, prefill/decode disaggregation results, and capacity planning via simulation
Includes 6 figures and author entries

Authors

Mert Toslali, Dipanwita Guhathakurta, Srinivasan Parthasarathy, Jing Chen, Nick Masluk, Vishakha Ramani, Michael Kalantar, Asser Tantawi, Fabio Oliveira, Carlos Costa

netlify · 2026-06-05T20:04:39Z

✅ Deploy Preview for elaborate-kangaroo-25e1ee ready!

Name	Link
🔨 Latest commit	`f89b95e`
🔍 Latest deploy log	https://app.netlify.com/projects/elaborate-kangaroo-25e1ee/deploys/6a2330094982e30007515e44
😎 Deploy Preview	https://deploy-preview-344--elaborate-kangaroo-25e1ee.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Mert Toslali <toslali@ibm.com>

chcost

strong core thesis but needs to align a bit better with the technical tone other llm-d published blogs. See inline comments for specific suggestions...

chcost · 2026-06-05T21:21:43Z

+
+### Capacity planning
+
+Before you deploy any LLMs for any purpose, you need answers:


This is the most technically novel result in the post and it deserves substantially more depth. Applying Drift Plus Penalty (Lyapunov optimization) to P/D disaggregation decisions is what is novel. I a few things I think we can add:
Why does DPP work here? What queue stability / penalty tradeoff is it optimizing?
What model, hardware, workload, and QPS were used? The "2-20x" range is too wide without explaining what drives the variance.
You mention "regimes where each policy stands out". that is the interesting finding. When does always-local win? When does EDPP dominate?

chcost · 2026-06-05T21:21:43Z

+
+BLIS has two jobs. It helps llm-d evolve faster, and it helps users plan deployments before spending GPU time. Let's start with the bigger one.
+
+### AI-native evolution of llm-d


"High-fidelity" needs a number. Vidur reports <9% error. AIConfigurator reports 6-12% MAPE. What is BLIS's fidelity vs real hardware? Without this, the claim is an unsubstantiated adjective. Even a single validation benchmark (e.g., BLIS predicted X TTFT at Y QPS, real cluster measured Z, error was N%) would ground this.

chcost · 2026-06-05T21:21:43Z

+- The router picks which vLLM instance handles each request. It looks at prefix cache hits, queue depth, and KV use.
+- Each instance decides which requests to batch together right now.
+- The KV cache has to find room. Old blocks may need to make space.
+- The autoscaler watches load and brings new instances up if needed.


A note on tone: our published blogs (predicted-latency, v0.5) establish technical credibility through precise, declarative prose. Stating problems with data and letting the evidence carry the argument. This section uses hypothetical framing ("Imagine 500 requests...") and colloquial language ("Things look fine, then suddenly they don't", "scan a hundred settings before lunch") that reads differently from the rest of our blogs. I'd recommend adopting the same engineering-report tone: direct, confident, evidence-first. Also, llm-d audience doesn't need to be walked through what distributed serving is.

Consider cutting this section entirely and weaving the essential points (many interacting knobs, hard to predict on paper) into the intro.

chcost · 2026-06-05T21:21:43Z

+
+For the full story, see [our earlier post on the admission controller loop](https://ai-native-systems-research.github.io/ai-native-systems-research/blog/2026/05/13/from-simulation-to-production-how-an-ai-native-pipeline-discovered-a-better-admission-controller-for-llm-d/).
+
+#### When to disaggregate prefill and decode


"TTFT p90 was about 30x faster in the tail" on what model? What hardware? What workload shape? What QPS? "About 30x" is imprecise.

The predicted-latency blog gives a table with exact numbers on Qwen3-480B across 13 servers with 8xH200s. Every quantitative claim in an llm-d blog should be accompanied by the configuration that produced it. This is especially important for a simulator blog... if the claim comes from simulation, the reader needs to know the simulation parameters to judge the result.

chcost · 2026-06-05T21:21:43Z

+  - [Why simulate before you scale](https://inference-sim.github.io/inference-sim/latest/blog/2026/03/05/why-simulate-before-you-scale/)
+  - [The physics of high-fidelity distributed inference platform simulation](https://medium.com/modeling-distributed-inference/the-physics-of-high-fidelity-distributed-inference-platform-simulation-28fe27b59da2)
+- **The admission controller story in full:** [From simulation to production](https://ai-native-systems-research.github.io/ai-native-systems-research/blog/2026/05/13/from-simulation-to-production-how-an-ai-native-pipeline-discovered-a-better-admission-controller-for-llm-d/)
+- **The upcoming BLIS proposal for llm-d**


The post should acknowledge limitations. what can't BLIS model? Where does the performance model break down? What workloads give poor fidelity?

The predicted-latency blog explicitly acknowledges Scenario C where its approach only matches (not beats) the baseline, and explains why. This kind of honesty builds trust with a technical audience. A post that presents everything as a win reads as advocacy rather than engineering.

mtoslalibu requested review from Gregory-Pereira, chcost, clubanderson, davidgs, jjasghar, petecheslock, robertgshaw2-redhat and smarterclayton as code owners June 5, 2026 20:04

mtoslalibu force-pushed the blog/blis-simulator branch from 9ca5adc to 87b56fe Compare June 5, 2026 20:14

Add blis blogpost

f89b95e

Signed-off-by: Mert Toslali <toslali@ibm.com>

mtoslalibu force-pushed the blog/blis-simulator branch from 87b56fe to f89b95e Compare June 5, 2026 20:22

chcost reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog: BLIS — Evolving llm-d at simulation speed#344

Blog: BLIS — Evolving llm-d at simulation speed#344
mtoslalibu wants to merge 1 commit into
llm-d:mainfrom
mtoslalibu:blog/blis-simulator

mtoslalibu commented Jun 5, 2026

Uh oh!

netlify Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

chcost left a comment •

edited

Loading

Uh oh!

chcost Jun 5, 2026 •

edited

Loading

Uh oh!

chcost Jun 5, 2026 •

edited

Loading

Uh oh!

chcost Jun 5, 2026 •

edited

Loading

Uh oh!

chcost Jun 5, 2026 •

edited

Loading

Uh oh!

chcost Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### Capacity planning

		Before you deploy any LLMs for any purpose, you need answers:


		BLIS has two jobs. It helps llm-d evolve faster, and it helps users plan deployments before spending GPU time. Let's start with the bigger one.

		### AI-native evolution of llm-d


		For the full story, see [our earlier post on the admission controller loop](https://ai-native-systems-research.github.io/ai-native-systems-research/blog/2026/05/13/from-simulation-to-production-how-an-ai-native-pipeline-discovered-a-better-admission-controller-for-llm-d/).

		#### When to disaggregate prefill and decode

Conversation

mtoslalibu commented Jun 5, 2026

Summary

Authors

Uh oh!

netlify Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for elaborate-kangaroo-25e1ee ready!

Uh oh!

chcost left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chcost Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chcost Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chcost Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chcost Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chcost Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify Bot commented Jun 5, 2026 •

edited

Loading

chcost left a comment •

edited

Loading

chcost Jun 5, 2026 •

edited

Loading

chcost Jun 5, 2026 •

edited

Loading

chcost Jun 5, 2026 •

edited

Loading

chcost Jun 5, 2026 •

edited

Loading

chcost Jun 5, 2026 •

edited

Loading