Skip to content

Agent: reorganize workspace directory structure#714

Merged
luotao1 merged 2 commits into
PaddlePaddle:developfrom
luotao1:cleanup-dirs
May 19, 2026
Merged

Agent: reorganize workspace directory structure#714
luotao1 merged 2 commits into
PaddlePaddle:developfrom
luotao1:cleanup-dirs

Conversation

@luotao1
Copy link
Copy Markdown
Collaborator

@luotao1 luotao1 commented May 19, 2026

PR Category

Feature Enhancement

Description

Agent: 整理 workspace 目录结构

背景

批量抽取运行后,模型目录和结果文件混杂在 workspace 根目录:

  • 成功和失败的模型目录混在一起
  • 结果 JSON 散落在各处
  • 子图输出 clutter 根目录

这给后续分析、重跑和文件管理带来不便。

变更内容

1. graph_net/agent/utils/workspace_manager.py

新增三个目录属性:

  • success_dirworkspace/success/,存放抽取成功的模型样本
  • failed_dirworkspace/failed/,存放抽取失败的模型目录
  • logs_and_lists_dirworkspace/logs_and_lists/,存放结果 JSON 和模型列表

_ensure_directories() 初始化时自动创建以上三个目录。

2. graph_net/agent/graph_net_agent.py
  • extract_sample() 中跟踪 sample_dir,提取完成后自动 move 到对应目录:
    • 验证通过 / 去重命中 → success/
    • 验证失败 / 异常退出 → failed/(如目录已存在则覆盖)
  • 新增 _move_sample() 辅助方法,封装 shutil.move 及覆盖逻辑
  • is_duplicate_sample() 改为同时扫描 success_dirsamples_dir,确保历史成功样本能正确命中去重检查
  • 保留 _is_llm_fixable_error() 方法用于判断是否值得 LLM 重试
3. graph_net/agent/parallel_extract.py

批量抽取结果 JSON 默认输出路径从 workspace 根改为:

workspace/logs_and_lists/parallel_extract_<时间戳>.json

结果文件统一归档,无需手动整理。

4. graph_net/agent/graph_extractor/subprocess_graph_extractor.py

隔离抽取输出到 samples/ 子目录:

  • 设置 GRAPH_NET_EXTRACT_WORKSPACE=workspace/samples/,避免结果散落在 workspace 根目录
  • _get_workspace_path() 默认返回 samples/ 子目录
  • 解决多次抽取后 workspace 根目录产生大量冗余模型目录的问题

验证

使用单个模型(sshleifer/tiny-gpt2)在 CPU 模式下测试:

workspace/
├── failed/           ← 空(无失败)
├── generated/        ← 生成的 run_model.py 脚本
├── logs/             ← Agent 日志
├── logs_and_lists/   ← 结果 JSON
├── models/           ← 下载的模型文件
├── samples/          ← 抽取输出(临时)
└── success/          ← 最终成功样本
    └── sshleifer_tiny-gpt2

结果:1 个模型,成功率 100%,样本正确移动到 success/

luotao1 added 2 commits May 19, 2026 19:27
- workspace_manager.py: add success_dir, failed_dir, logs_and_lists_dir
- graph_net_agent.py: auto-move samples to success/ or failed/ after extraction
- parallel_extract.py: output JSON to logs_and_lists/ instead of workspace root
- Set GRAPH_NET_EXTRACT_WORKSPACE=workspace/samples/ in subprocess env
- _get_workspace_path() defaults to samples/ subdir
- Prevents clutter in workspace root from redundant model directories
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 19, 2026

Thanks for your contribution!

@luotao1 luotao1 merged commit 2e5a40e into PaddlePaddle:develop May 19, 2026
3 checks passed
@luotao1 luotao1 deleted the cleanup-dirs branch May 20, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants