Agent: fix graph hash generation for multi-subgraph models and unify hash utilities by luotao1 · Pull Request #716 · PaddlePaddle/GraphNet

luotao1 · 2026-05-20T07:08:48Z

PR Category

Feature Enhancement

Description

背景

原 _generate_graph_hash 只处理单图模型（根目录下有 model.py），对多子图模型（目录结构为 subgraph_0/, subgraph_1/, ...）直接跳过，导致 graph_hash.txt 始终无法生成。具体影响：

多子图模型缺少 graph_hash.txt
is_duplicate_sample() 依赖 graph_hash.txt 做去重，缺失时直接返回 False，相同计算图结构会被重复抽取
对 fine-tune 变体（共用同一 base 架构）浪费大量计算资源

同时，子图去重脚本 gen_hash_and_dedup.py 内联实现了 SHA256 哈希计算，与库内统一的 graph_net.hash_util.get_sha256_hash 不一致，不利于维护。

修改内容

1. 修复多子图模型的 hash 生成与去重 (`graph_net_agent.py`)

单/多子图统一处理：新增 _get_subgraph_dirs() 辅助方法
- 单图模型：返回 [sample_dir]
- 多子图模型：返回 [subgraph_0, subgraph_1, ...] 排序后的列表
- 消除所有 subgraph_xxx hardcoded glob
_generate_graph_hash：基于 _get_subgraph_dirs 统一循环处理
- 每个子图目录独立计算 model.py 的 sha256，生成对应目录下的 graph_hash.txt
- 单图模型即循环一次，逻辑自然兼容
is_duplicate_sample：基于 _get_subgraph_dirs 统一收集 hash
- 定义内部 _collect_hashes() 函数，遍历所有子图目录收集 graph_hash.txt
- 单图/多子图统一使用 frozenset 比对，消除重复分支
_fix_model_name：同样复用 _get_subgraph_dirs，消除硬编码 glob

2. 统一 hash 工具 (`gen_hash_and_dedup.py`)

删除内联的 get_sha256_hash 实现
复用 graph_net.hash_util.get_sha256_hash，与 Agent 抽取流程及其他模块保持一致

验证结果

对历史成功样本 /home/luotao02/workspace/success_backup_20260519 运行去重分析：

python graph_net/agent/scripts/gen_hash_and_dedup.py /path/to/workspace

输出：

Found 24430 model.py files under /path/to/workspace
Step 1 - Generate graph_hash.txt:
  Total model.py: 24430
  Generated/Updated: 24430
  Failed: 0
Step 2 - Deduplication analysis:
  Total subgraphs: 24430
  Unique graphs: 503
  Duplicate groups: 344
  Subgraphs involved in duplication: 24271
  Can be removed (keeping one per group): 23927

- _generate_graph_hash: generate per-subgraph hashes (subgraph_N/graph_hash.txt) instead of a single top-level hash, avoiding false collisions - is_duplicate_sample: use frozenset of subgraph hashes for multi-subgraph models, preventing rglob false matches on per-subgraph hash files - Single-graph model logic unchanged (root graph_hash.txt)

Replace inline SHA256 implementation with the canonical get_sha256_hash from graph_net.hash_util, consistent with the agent extraction pipeline and other modules.

paddle-bot · 2026-05-20T07:09:18Z

Thanks for your contribution!

- Add _get_subgraph_dirs() helper: returns [sample_dir] for single-graph or [subgraph_0, subgraph_1, ...] for multi-subgraph models - Refactor _fix_model_name, _generate_graph_hash, is_duplicate_sample to use the helper, eliminating hardcoded subgraph_xxx globs - is_duplicate_sample now collects hashes from all subgraphs uniformly via frozenset comparison, regardless of model layout

luotao1 added 2 commits May 20, 2026 14:43

Agent: reuse graph_net.hash_util in gen_hash_and_dedup.py

31a5e89

Replace inline SHA256 implementation with the canonical get_sha256_hash from graph_net.hash_util, consistent with the agent extraction pipeline and other modules.

Xreki approved these changes May 20, 2026

View reviewed changes

luotao1 merged commit ce9453d into PaddlePaddle:develop May 20, 2026
3 checks passed

luotao1 deleted the hash branch May 20, 2026 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent: fix graph hash generation for multi-subgraph models and unify hash utilities#716

Agent: fix graph hash generation for multi-subgraph models and unify hash utilities#716
luotao1 merged 3 commits into
PaddlePaddle:developfrom
luotao1:hash

luotao1 commented May 20, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luotao1 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Description

背景

修改内容

1. 修复多子图模型的 hash 生成与去重 (graph_net_agent.py)

2. 统一 hash 工具 (gen_hash_and_dedup.py)

验证结果

Uh oh!

paddle-bot Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luotao1 commented May 20, 2026 •

edited

Loading

1. 修复多子图模型的 hash 生成与去重 (`graph_net_agent.py`)

2. 统一 hash 工具 (`gen_hash_and_dedup.py`)