[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy) by purushah · Pull Request #852 · apache/flink-agents

purushah · 2026-06-15T20:37:13Z

What is the purpose of the change

Adds a drop-in chat model that routes each request to the best underlying model, then delegates to it. The router is a CHAT_MODEL resource, so an agent points at it by name with no change to the runtime, events, or agent definition.

This is the in-chat selector (which LLM serves a single chat() call). A DataStream-level content-based agent router (branching records across agent operators) is a separate, follow-up concern.

Brief change log

RoutingStrategy — pluggable selection SPI (request -> candidate name). Selection is a pure concern; returning null means "abstain / no opinion".
ChatModelRouter — orchestrates select → (optional cache) → validate → delegate. A strategy that abstains (null) or names a non-candidate is a routing miss and degrades to the configured default candidate (validated at construction; defaults to the first candidate) rather than failing the request.
FallbackPolicy — optional: try remaining candidates on error.
CachingStrategy — optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation, not once per tool-call round. Abstentions (null) are never cached.
Built-in strategies:
- RuleBasedRoutingStrategy — deterministic keyword/regex rules + default.
- LlmRoutingStrategy — a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Distinguishes a transient judge failure (abstain → retried next round, uncached) from an unparseable reply (deterministic default). Parses by whole-token match (no substring mis-routing, e.g. gpt-4o-mini won't match a gpt-4 candidate).
Bring-your-own strategies are first-class: implement RoutingStrategy and reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way.
Adds LlmRoutingAgentExample and unit tests.

Verifying this change

This change adds tests and can be verified as follows:

Unit tests under api/.../chat/model/routing/ covering rule selection, judge parsing (whole-token match), stickiness across tool-call rounds, fallback, caching (incl. abstain-not-cached), routing-miss degrade-to-default, and bring-your-own loading. All pass; spotless:check clean (JDK 17).

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API: yes — adds the org.apache.flink.agents.api.chat.model.routing package (additive; no existing API changed).
The serializers: no
The runtime per-record code paths: no (router is a CHAT_MODEL resource resolved by name)
Anything that affects deployment or recovery: no — preserves exactly-once / keyed-state / checkpoint semantics (no new operator, no nested invocation).

Security note

An LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge model, so a routing decision is susceptible to prompt injection. Cost/privilege/safety controls must not be gated solely on it. This is documented on LlmRoutingStrategy.

Documentation

New public package is documented via javadoc on each type. Built-in strategies, the abstain/routing-miss contract, and the bring-your-own extension point are described on the SPI.

Documentation

doc-needed
doc-not-needed
doc-included

…ngStrategy) Add a drop-in chat model that selects which underlying model serves each request, then delegates to it. The router is a CHAT_MODEL resource, so an agent points at it by name with no runtime, event, or agent-definition change. Selection is a pluggable SPI (`RoutingStrategy`), decomposed into orthogonal concerns: - RoutingStrategy — pure selection (request -> candidate name). Returning null means "abstain / no opinion". - FallbackPolicy — optional: try remaining candidates on error. - CachingStrategy — optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation rather than once per tool-call round. Built-in strategies: - RuleBasedRoutingStrategy — deterministic keyword/regex rules + default. - LlmRoutingStrategy — a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Bring-your-own strategies are first-class: implement RoutingStrategy and reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way. Routing-miss semantics: a strategy that abstains (null) or names a non-candidate degrades to the configured `default` candidate (validated at construction; defaults to the first candidate) rather than failing the request. The LLM judge distinguishes a transient failure (abstain -> not cached, retried next round) from an unparseable reply (deterministic default). Security: an LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge, so cost/privilege/safety must not be gated solely on it (prompt-injection risk). This is documented on the strategy. Includes an example (LlmRoutingAgentExample) and unit tests covering rule selection, judge parsing (whole-token match, no substring mis-routing), stickiness, fallback, caching (incl. abstain-not-cached), and bring-your-own. Also mirror the RULE_BASED/LLM ResourceName constants on the Python side (ResourceName.RoutingStrategy.Java) and register RoutingStrategy in the cross-language ResourceName parity check.

weiqingy

Thanks for taking this on, @purushah. A few questions inline.

@weiqingy

…at test Review follow-ups from @weiqingy on the routing PR: - ChatModelRouter.open(): document the load-bearing invariant the no-op relies on — a routed candidate is lazily open()-ed by ResourceCache.getResource() on first resolution, so its connection is non-null before chat() runs. - CachingStrategy / LlmRoutingStrategy: soften "runs once per conversation" to "typically once" and document that memoization is best-effort (a concurrent first-touch on the same key can double-compute; synchronized map, last-writer- wins, benign — so no locking). - RoutingCandidate: reject an empty name (not just null) — an empty name has no resolvable resource and would make LlmRoutingStrategy.parseChoice's whole-token match over-match arbitrary boundaries (mis-route). - Tests: add ChatModelRouterTest cases pinning the open-before-chat invariant (candidate resolved through an opening ResourceContext, mirroring ResourceCache; plus the negative case proving it is load-bearing), and RoutingCandidateTest for the null/empty name guards. 39 routing tests pass; spotless:check clean under JDK 17.

weiqingy · 2026-06-30T20:00:20Z

Thanks for the follow-ups, @purushah. My comments are resolved. I'll leave the final call to the maintainers.

purushah · 2026-07-01T15:46:20Z

Thanks @weiqingy — really appreciate you taking the time to go through the design and call out the subtle cases. Your comments were very helpful, especially around the open() invariant and the caching wording. Glad the follow-ups addressed them.

github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels Jun 15, 2026

purushah force-pushed the routing-pr branch from b69af3b to e4e62ed Compare June 15, 2026 21:53

github-actions Bot added doc-needed Your PR changes impact docs. and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Jun 15, 2026

weiqingy reviewed Jun 16, 2026

View reviewed changes

wenjin272 added fixVersion/0.4.0 and removed fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. labels Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852

[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852
purushah wants to merge 2 commits into
apache:mainfrom
purushah:routing-pr

purushah commented Jun 15, 2026 •

edited

Loading

Uh oh!

weiqingy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

weiqingy commented Jun 30, 2026

Uh oh!

purushah commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

purushah commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Security note

Documentation

Documentation

Uh oh!

weiqingy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

weiqingy commented Jun 30, 2026

Uh oh!

purushah commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

purushah commented Jun 15, 2026 •

edited

Loading