Skip to content

[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852

Open
purushah wants to merge 2 commits into
apache:mainfrom
purushah:routing-pr
Open

[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852
purushah wants to merge 2 commits into
apache:mainfrom
purushah:routing-pr

Conversation

@purushah

@purushah purushah commented Jun 15, 2026

Copy link
Copy Markdown

What is the purpose of the change

Adds a drop-in chat model that routes each request to the best underlying model, then delegates to it. The router is a CHAT_MODEL resource, so an agent points at it by name with no change to the runtime, events, or agent definition.

This is the in-chat selector (which LLM serves a single chat() call). A DataStream-level content-based agent router (branching records across agent operators) is a separate, follow-up concern.

Brief change log

  • RoutingStrategy — pluggable selection SPI (request -> candidate name). Selection is a pure concern; returning null means "abstain / no opinion".
  • ChatModelRouter — orchestrates select → (optional cache) → validate → delegate. A strategy that abstains (null) or names a non-candidate is a routing miss and degrades to the configured default candidate (validated at construction; defaults to the first candidate) rather than failing the request.
  • FallbackPolicy — optional: try remaining candidates on error.
  • CachingStrategy — optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation, not once per tool-call round. Abstentions (null) are never cached.
  • Built-in strategies:
    • RuleBasedRoutingStrategy — deterministic keyword/regex rules + default.
    • LlmRoutingStrategy — a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Distinguishes a transient judge failure (abstain → retried next round, uncached) from an unparseable reply (deterministic default). Parses by whole-token match (no substring mis-routing, e.g. gpt-4o-mini won't match a gpt-4 candidate).
  • Bring-your-own strategies are first-class: implement RoutingStrategy and reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way.
  • Adds LlmRoutingAgentExample and unit tests.

Verifying this change

This change adds tests and can be verified as follows:

  • Unit tests under api/.../chat/model/routing/ covering rule selection, judge parsing (whole-token match), stickiness across tool-call rounds, fallback, caching (incl. abstain-not-cached), routing-miss degrade-to-default, and bring-your-own loading. All pass; spotless:check clean (JDK 17).

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: yes — adds the org.apache.flink.agents.api.chat.model.routing package (additive; no existing API changed).
  • The serializers: no
  • The runtime per-record code paths: no (router is a CHAT_MODEL resource resolved by name)
  • Anything that affects deployment or recovery: no — preserves exactly-once / keyed-state / checkpoint semantics (no new operator, no nested invocation).

Security note

An LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge model, so a routing decision is susceptible to prompt injection. Cost/privilege/safety controls must not be gated solely on it. This is documented on LlmRoutingStrategy.

Documentation

  • New public package is documented via javadoc on each type. Built-in strategies, the abstain/routing-miss contract, and the bring-your-own extension point are described on the SPI.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@github-actions github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels Jun 15, 2026
…ngStrategy)

Add a drop-in chat model that selects which underlying model serves each
request, then delegates to it. The router is a CHAT_MODEL resource, so an
agent points at it by name with no runtime, event, or agent-definition change.

Selection is a pluggable SPI (`RoutingStrategy`), decomposed into orthogonal
concerns:

- RoutingStrategy — pure selection (request -> candidate name). Returning null
  means "abstain / no opinion".
- FallbackPolicy — optional: try remaining candidates on error.
- CachingStrategy — optional bounded-LRU memoization of the decision per
  conversation, so an expensive strategy (e.g. an LLM judge) runs once per
  conversation rather than once per tool-call round.

Built-in strategies:

- RuleBasedRoutingStrategy — deterministic keyword/regex rules + default.
- LlmRoutingStrategy — a small "judge" model picks the candidate from each
  candidate's name/description (RouteLLM-style).

Bring-your-own strategies are first-class: implement RoutingStrategy and
reference it by fully-qualified class name; loaded via the thread context
classloader (cluster-safe). ML/learned routing is supported the same way.

Routing-miss semantics: a strategy that abstains (null) or names a
non-candidate degrades to the configured `default` candidate (validated at
construction; defaults to the first candidate) rather than failing the
request. The LLM judge distinguishes a transient failure (abstain -> not
cached, retried next round) from an unparseable reply (deterministic default).

Security: an LLM/ML routing decision is a hint, not an authority — the user's
message is sent to the judge, so cost/privilege/safety must not be gated
solely on it (prompt-injection risk). This is documented on the strategy.

Includes an example (LlmRoutingAgentExample) and unit tests covering rule
selection, judge parsing (whole-token match, no substring mis-routing),
stickiness, fallback, caching (incl. abstain-not-cached), and bring-your-own.

Also mirror the RULE_BASED/LLM ResourceName constants on the Python side
(ResourceName.RoutingStrategy.Java) and register RoutingStrategy in the
cross-language ResourceName parity check.
@github-actions github-actions Bot added doc-needed Your PR changes impact docs. and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Jun 15, 2026

@weiqingy weiqingy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, @purushah. A few questions inline.

Comment thread python/flink_agents/api/resource.py
…at test

Review follow-ups from @weiqingy on the routing PR:

- ChatModelRouter.open(): document the load-bearing invariant the no-op relies
  on — a routed candidate is lazily open()-ed by ResourceCache.getResource() on
  first resolution, so its connection is non-null before chat() runs.
- CachingStrategy / LlmRoutingStrategy: soften "runs once per conversation" to
  "typically once" and document that memoization is best-effort (a concurrent
  first-touch on the same key can double-compute; synchronized map, last-writer-
  wins, benign — so no locking).
- RoutingCandidate: reject an empty name (not just null) — an empty name has no
  resolvable resource and would make LlmRoutingStrategy.parseChoice's whole-token
  match over-match arbitrary boundaries (mis-route).
- Tests: add ChatModelRouterTest cases pinning the open-before-chat invariant
  (candidate resolved through an opening ResourceContext, mirroring ResourceCache;
  plus the negative case proving it is load-bearing), and RoutingCandidateTest
  for the null/empty name guards.

39 routing tests pass; spotless:check clean under JDK 17.
@wenjin272 wenjin272 added fixVersion/0.4.0 and removed fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. labels Jun 17, 2026
@weiqingy

Copy link
Copy Markdown
Collaborator

Thanks for the follow-ups, @purushah. My comments are resolved. I'll leave the final call to the maintainers.

@purushah

purushah commented Jul 1, 2026

Copy link
Copy Markdown
Author

Thanks @weiqingy — really appreciate you taking the time to go through the design and call out the subtle cases. Your comments were very helpful, especially around the open() invariant and the caching wording. Glad the follow-ups addressed them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-needed Your PR changes impact docs. fixVersion/0.4.0 priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants