Skip to content

fix(core): fix list hash stability for equivalent messages#1364

Open
Weilong-Qin wants to merge 13 commits into
agentscope-ai:mainfrom
Weilong-Qin:fix/unstable-computehash
Open

fix(core): fix list hash stability for equivalent messages#1364
Weilong-Qin wants to merge 13 commits into
agentscope-ai:mainfrom
Weilong-Qin:fix/unstable-computehash

Conversation

@Weilong-Qin

@Weilong-Qin Weilong-Qin commented May 9, 2026

Copy link
Copy Markdown

AgentScope-Java Version

1.0.13-SNAPSHOT

Description

Fixes #1357
ListHashUtil.computeHash was using each list item’s hashCode() directly.
As a result, two separately constructed but semantically identical message lists could produce different hash values. It leads to unnecessary full rewrites in session persistence paths.
To fix it, hash each item based on its JSON serialization.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with mvn spotless:apply
  • All tests are passing (mvn test)
  • Javadoc comments are complete and follow project conventions
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

@Weilong-Qin Weilong-Qin requested review from a team and Copilot May 9, 2026 09:26
@CLAassistant

CLAassistant commented May 9, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes unstable list hashing used by session persistence by switching ListHashUtil.computeHash from element hashCode() to hashing a JSON-serialized representation of sampled elements, aiming to prevent unnecessary full rewrites when lists are logically equivalent.

Changes:

  • Update ListHashUtil.computeHash to derive each sampled element hash from JsonUtils.getJsonCodec().toJson(item).
  • Add a unit test asserting that two separately constructed but equivalent List<Msg> instances produce the same hash.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
agentscope-core/src/main/java/io/agentscope/core/session/ListHashUtil.java Switch per-item hashing from hashCode() to JSON-serialization-based hashing for better stability across equivalent objects.
agentscope-extensions/agentscope-extensions-session-mysql/src/test/java/io/agentscope/core/session/mysql/MysqlSessionTest.java Add regression test to validate stable hashing for equivalent message lists.

@luowanghaoyun

Copy link
Copy Markdown
Contributor

Per-item toJson in computeHash can be expensive (allocations + serialization) on frequent saves. Would it make sense to switch to a cheaper stable fingerprint (e.g. value-based hashCode on persisted State types, or a small dedicated fingerprint API) instead of JSON here?

@Weilong-Qin

Copy link
Copy Markdown
Author

Per-item toJson in computeHash can be expensive (allocations + serialization) on frequent saves. Would it make sense to switch to a cheaper stable fingerprint (e.g. value-based hashCode on persisted State types, or a small dedicated fingerprint API) instead of JSON here?

OK, I will try to override the hashCode and equals methods based on attribute values for the relevant implementation classes.

@LearningGp LearningGp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolUseBlock.metadata stores Gemini's byte[] thoughtSignature via METADATA_THOUGHT_SIGNATURE. However, Map.equals() delegates to the elements' equals() methods, and in Java, byte[].equals() and byte[].hashCode() rely on reference equality (identity-based).

As a result, Gemini-generated messages containing tool calls will still produce different hashes after being deserialized or reconstructed. Therefore, the original bug remains unfixed in the Gemini execution path.

@Weilong-Qin

Copy link
Copy Markdown
Author

Thanks.

I updated METADATA_THOUGHT_SIGNATURE to follow the OpenAI path and store a Base64-encoded String instead of a byte[]. This keeps the internal representation stable and allows Map.equals to compare values correctly.

ToolUseBlock now normalizes any byte[] under that key into a Base64 String, GeminiResponseParser stores the signature in that format, and GeminiMessageConverter decodes it back to byte[] only when calling the Gemini API.

Does this approach look reasonable?

@AgentScopeJavaBot AgentScopeJavaBot added bug Something isn't working area/core/agent Agent runtime, pipeline, hooks, plan area/core/memory Memory, session, state area/core/model Model providers and formatters labels May 28, 2026

@LearningGp LearningGp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LearningGp

Copy link
Copy Markdown
Member
image

# Conflicts:
#	agentscope-core/src/main/java/io/agentscope/core/message/ToolUseBlock.java
#	agentscope-core/src/main/java/io/agentscope/core/model/ChatUsage.java
@Weilong-Qin Weilong-Qin force-pushed the fix/unstable-computehash branch from 2bc3080 to bec4729 Compare June 4, 2026 10:28

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

This PR adds value-based equals()/hashCode() to Msg, most ContentBlock subtypes (TextBlock, ImageBlock, AudioBlock, VideoBlock, ThinkingBlock, ToolUseBlock, ToolResultBlock), their dependency types (Base64Source, URLSource, ChatUsage, OpenAIReasoningDetail), and normalizes ToolUseBlock metadata to convert legacy byte[] thought signatures to Base64 strings for stable serialization round-trips. It also fixes the Gemini formatter and response parser to align with the new Base64 representation, and includes a small bug fix in ToolCallParam.Builder's copy constructor (context -> runtimeContext). The changes are well-tested with comprehensive unit tests covering value equality, round-trip serialization, and hash stability scenarios. The overall approach of providing value-based equality at the message/block layer to stabilize ListHashUtil.computeHash() is sound and solves the root cause correctly.

(inline comments could not be attached — line numbers fell outside PR hunks. See archived report.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core/agent Agent runtime, pipeline, hooks, plan area/core/memory Memory, session, state area/core/model Model providers and formatters bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:Unstable computeHash causes unnecessary MysqlSession rewrites

6 participants