fix(core): fix list hash stability for equivalent messages#1364
fix(core): fix list hash stability for equivalent messages#1364Weilong-Qin wants to merge 13 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes unstable list hashing used by session persistence by switching ListHashUtil.computeHash from element hashCode() to hashing a JSON-serialized representation of sampled elements, aiming to prevent unnecessary full rewrites when lists are logically equivalent.
Changes:
- Update
ListHashUtil.computeHashto derive each sampled element hash fromJsonUtils.getJsonCodec().toJson(item). - Add a unit test asserting that two separately constructed but equivalent
List<Msg>instances produce the same hash.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| agentscope-core/src/main/java/io/agentscope/core/session/ListHashUtil.java | Switch per-item hashing from hashCode() to JSON-serialization-based hashing for better stability across equivalent objects. |
| agentscope-extensions/agentscope-extensions-session-mysql/src/test/java/io/agentscope/core/session/mysql/MysqlSessionTest.java | Add regression test to validate stable hashing for equivalent message lists. |
|
Per-item toJson in computeHash can be expensive (allocations + serialization) on frequent saves. Would it make sense to switch to a cheaper stable fingerprint (e.g. value-based hashCode on persisted State types, or a small dedicated fingerprint API) instead of JSON here? |
OK, I will try to override the hashCode and equals methods based on attribute values for the relevant implementation classes. |
LearningGp
left a comment
There was a problem hiding this comment.
ToolUseBlock.metadata stores Gemini's byte[] thoughtSignature via METADATA_THOUGHT_SIGNATURE. However, Map.equals() delegates to the elements' equals() methods, and in Java, byte[].equals() and byte[].hashCode() rely on reference equality (identity-based).
As a result, Gemini-generated messages containing tool calls will still produce different hashes after being deserialized or reconstructed. Therefore, the original bug remains unfixed in the Gemini execution path.
54f6b8d to
32c03f1
Compare
|
Thanks. I updated
Does this approach look reasonable? |
# Conflicts: # agentscope-core/src/main/java/io/agentscope/core/message/ToolUseBlock.java # agentscope-core/src/main/java/io/agentscope/core/model/ChatUsage.java
2bc3080 to
bec4729
Compare
AgentScopeJavaBot
left a comment
There was a problem hiding this comment.
🤖 AI Review
This PR adds value-based equals()/hashCode() to Msg, most ContentBlock subtypes (TextBlock, ImageBlock, AudioBlock, VideoBlock, ThinkingBlock, ToolUseBlock, ToolResultBlock), their dependency types (Base64Source, URLSource, ChatUsage, OpenAIReasoningDetail), and normalizes ToolUseBlock metadata to convert legacy byte[] thought signatures to Base64 strings for stable serialization round-trips. It also fixes the Gemini formatter and response parser to align with the new Base64 representation, and includes a small bug fix in ToolCallParam.Builder's copy constructor (context -> runtimeContext). The changes are well-tested with comprehensive unit tests covering value equality, round-trip serialization, and hash stability scenarios. The overall approach of providing value-based equality at the message/block layer to stabilize ListHashUtil.computeHash() is sound and solves the root cause correctly.
(inline comments could not be attached — line numbers fell outside PR hunks. See archived report.)

AgentScope-Java Version
1.0.13-SNAPSHOT
Description
Fixes #1357
ListHashUtil.computeHashwas using each list item’shashCode()directly.As a result, two separately constructed but semantically identical message lists could produce different hash values. It leads to unnecessary full rewrites in session persistence paths.
To fix it, hash each item based on its JSON serialization.
Checklist
Please check the following items before code is ready to be reviewed.
mvn spotless:applymvn test)