Strip orphan tool_calls from chat memory in streaming tool-call turns#6356
Open
jewoodev wants to merge 1 commit into
Open
Strip orphan tool_calls from chat memory in streaming tool-call turns#6356jewoodev wants to merge 1 commit into
tool_calls from chat memory in streaming tool-call turns#6356jewoodev wants to merge 1 commit into
Conversation
When stream() is combined with
ToolCallingAdvisor.streamToolCallResponses(true), the MessageAggregator
folds the tool-call round and the recursive post-tool text round into a
single AssistantMessage(text + tool_calls). MessageChatMemoryAdvisor.after()
then persisted that message verbatim. Its tool_calls are orphaned in
memory (there is no following ToolResponseMessage), so replaying the
history on the next turn can fail on OpenAI-compatible backends with HTTP
400 ("An assistant message with 'tool_calls' must be followed by tool
messages").
Fix: sanitize each assistant message before it is persisted as
cross-turn history. For an aggregated/base assistant message that still
carries text, keep the text, metadata and media and drop only the
tool_calls; a pure tool-call frame with no text is an intra-turn
artifact and is dropped entirely. The tool-call round-trip is intra-turn
structure owned by the tool-calling loop, not long-term conversation
history.
Adds unit tests covering MessageChatMemoryAdvisor persistence behavior:
- tool_calls are stripped while the assistant text is kept
- text, metadata and media are preserved when tool_calls are stripped
- a pure tool-call frame is dropped even when it carries metadata/media
- an assistant message with no tool_calls is stored unchanged
- a unit-level reproduction through the real ChatClientMessageAggregator
folding path (mocked StreamAdvisorChain) asserting replay-clean memory
Closes spring-projects#6340
Signed-off-by: jewoodev <jewoos15@naver.com>
tool_calls from chat memory in streaming tool-call turns
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With
stream()+ToolCallingAdvisor.streamToolCallResponses(true)+MessageChatMemoryAdvisor, a tool-calling turn persists anAssistantMessagethat carries both the final text and the intermediatetool_calls. Replaying that history on the next turn fails on OpenAI-compatible backends with HTTP 400 (An assistant message with 'tool_calls' must be followed by tool messages).MessageChatMemoryAdvisor(orderHIGHEST_PRECEDENCE + 200) sits outsideToolCallingAdvisor(+300), so withstreamToolCallResponses(true)the multi-round flux — the tool-call roundconcatWiththe recursive post-tool text round — reaches the memory advisor and is folded through a singleChatClientMessageAggregator. The aggregator is built to fold a single response stream into oneAssistantMessage, so it merges the two rounds intoAssistantMessage(text + tool_calls), andafter()persisted that orphan verbatim (there is no followingToolResponseMessagein memory).This sanitizes each assistant message before it is persisted as cross-turn history, as a framework default. For an aggregated/base assistant message that still carries text, it keeps the text, metadata and media and drops only the
tool_calls; a pure tool-call frame with no text is an intra-turn artifact and is dropped entirely. The tool-call round-trip is intra-turn structure owned by the tool-calling loop, not long-term conversation history.Verified with a unit-level reproduction that folds a two-round stream through the real
ChatClientMessageAggregatorand asserts the persisted memory contains no orphantool_calls, plus units pinning the text/metadata/media preservation and pure-frame drop:./mvnw test -pl spring-ai-client-chat -am -Dtest=MessageChatMemoryAdvisorTests -Dsurefire.failIfNoSpecifiedTests=falseCloses #6340