[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852
Open
purushah wants to merge 2 commits into
Open
[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852purushah wants to merge 2 commits into
purushah wants to merge 2 commits into
Conversation
…ngStrategy) Add a drop-in chat model that selects which underlying model serves each request, then delegates to it. The router is a CHAT_MODEL resource, so an agent points at it by name with no runtime, event, or agent-definition change. Selection is a pluggable SPI (`RoutingStrategy`), decomposed into orthogonal concerns: - RoutingStrategy — pure selection (request -> candidate name). Returning null means "abstain / no opinion". - FallbackPolicy — optional: try remaining candidates on error. - CachingStrategy — optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation rather than once per tool-call round. Built-in strategies: - RuleBasedRoutingStrategy — deterministic keyword/regex rules + default. - LlmRoutingStrategy — a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Bring-your-own strategies are first-class: implement RoutingStrategy and reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way. Routing-miss semantics: a strategy that abstains (null) or names a non-candidate degrades to the configured `default` candidate (validated at construction; defaults to the first candidate) rather than failing the request. The LLM judge distinguishes a transient failure (abstain -> not cached, retried next round) from an unparseable reply (deterministic default). Security: an LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge, so cost/privilege/safety must not be gated solely on it (prompt-injection risk). This is documented on the strategy. Includes an example (LlmRoutingAgentExample) and unit tests covering rule selection, judge parsing (whole-token match, no substring mis-routing), stickiness, fallback, caching (incl. abstain-not-cached), and bring-your-own. Also mirror the RULE_BASED/LLM ResourceName constants on the Python side (ResourceName.RoutingStrategy.Java) and register RoutingStrategy in the cross-language ResourceName parity check.
weiqingy
reviewed
Jun 16, 2026
…at test Review follow-ups from @weiqingy on the routing PR: - ChatModelRouter.open(): document the load-bearing invariant the no-op relies on — a routed candidate is lazily open()-ed by ResourceCache.getResource() on first resolution, so its connection is non-null before chat() runs. - CachingStrategy / LlmRoutingStrategy: soften "runs once per conversation" to "typically once" and document that memoization is best-effort (a concurrent first-touch on the same key can double-compute; synchronized map, last-writer- wins, benign — so no locking). - RoutingCandidate: reject an empty name (not just null) — an empty name has no resolvable resource and would make LlmRoutingStrategy.parseChoice's whole-token match over-match arbitrary boundaries (mis-route). - Tests: add ChatModelRouterTest cases pinning the open-before-chat invariant (candidate resolved through an opening ResourceContext, mirroring ResourceCache; plus the negative case proving it is load-bearing), and RoutingCandidateTest for the null/empty name guards. 39 routing tests pass; spotless:check clean under JDK 17.
Collaborator
|
Thanks for the follow-ups, @purushah. My comments are resolved. I'll leave the final call to the maintainers. |
Author
|
Thanks @weiqingy — really appreciate you taking the time to go through the design and call out the subtle cases. Your comments were very helpful, especially around the open() invariant and the caching wording. Glad the follow-ups addressed them. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Adds a drop-in chat model that routes each request to the best underlying model, then delegates to it. The router is a
CHAT_MODELresource, so an agent points at it by name with no change to the runtime, events, or agent definition.This is the in-chat selector (which LLM serves a single
chat()call). A DataStream-level content-based agent router (branching records across agent operators) is a separate, follow-up concern.Brief change log
RoutingStrategy— pluggable selection SPI (request -> candidate name). Selection is a pure concern; returningnullmeans "abstain / no opinion".ChatModelRouter— orchestrates select → (optional cache) → validate → delegate. A strategy that abstains (null) or names a non-candidate is a routing miss and degrades to the configureddefaultcandidate (validated at construction; defaults to the first candidate) rather than failing the request.FallbackPolicy— optional: try remaining candidates on error.CachingStrategy— optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation, not once per tool-call round. Abstentions (null) are never cached.RuleBasedRoutingStrategy— deterministic keyword/regex rules + default.LlmRoutingStrategy— a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Distinguishes a transient judge failure (abstain → retried next round, uncached) from an unparseable reply (deterministic default). Parses by whole-token match (no substring mis-routing, e.g.gpt-4o-miniwon't match agpt-4candidate).RoutingStrategyand reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way.LlmRoutingAgentExampleand unit tests.Verifying this change
This change adds tests and can be verified as follows:
api/.../chat/model/routing/covering rule selection, judge parsing (whole-token match), stickiness across tool-call rounds, fallback, caching (incl. abstain-not-cached), routing-miss degrade-to-default, and bring-your-own loading. All pass;spotless:checkclean (JDK 17).Does this pull request potentially affect one of the following parts:
org.apache.flink.agents.api.chat.model.routingpackage (additive; no existing API changed).CHAT_MODELresource resolved by name)Security note
An LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge model, so a routing decision is susceptible to prompt injection. Cost/privilege/safety controls must not be gated solely on it. This is documented on
LlmRoutingStrategy.Documentation
Documentation
doc-neededdoc-not-neededdoc-included