Skip to content

feat: マニュアル解説HTMLの変換・比較・配信と実運用手順 (#281)#323

Open
taminororo wants to merge 14 commits into
developfrom
feat/kanba/281-manual-generate
Open

feat: マニュアル解説HTMLの変換・比較・配信と実運用手順 (#281)#323
taminororo wants to merge 14 commits into
developfrom
feat/kanba/281-manual-generate

Conversation

@taminororo

@taminororo taminororo commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

概要

Issue #281「マニュアル生成のための変換、比較コードの実装」。Google Doc から書き出したマニュアルを、
スマホ最適化の自己完結 HTML(カード形式)に変換し、元の文章が保たれているかを検証する一連のコードと、
実運用手順を揃える。

  • 変換: AI ルート(Claude Agent SDK / プロンプト card-strict・文章不変ポリシー)+ 決定的ルート(pandoc AST、AI不要)
  • 比較: AI を呼ばない決定的な機械検証(pandoc + difflib)
  • 運用: 生成 → 検証 → GitHub Pages 配信 → Slack で部門長確認 → 修正ループ → 執行部最終チェック

このPRでの主な変更(最新コミット)

種別 内容
fix card-strict で生成HTMLに画像が base64 埋め込みされない不具合を修正(replace_placeholders 堅牢化+--embed-only
feat 機械検証に部門長向け文章チェックレポート verify_report.card-strict.md を追加(構造ノイズを畳み、確認すべき本文差だけを日本語で提示)
feat 部門長の修正案を instructions.md でプロンプト注入する再生成ループ対応
feat 生成HTMLを別リポジトリ(GitHub Pages)へ配信する publish.py
docs Slackスレッド中心の実運用手順書 manual-slide-operations.md

加えて、これ以前のコミットで生成エンジン・機械検証・決定的変換・プロンプト3種・automation スタブを実装済み。

運用モデル(develop に何を載せるか)

  • develop に載せるのは「道具」だけ: 生成/検証/配信スクリプトと手順書。
  • 載せないもの: マニュアルの元HTML・生成HTML・画像・検証レポート(verify_*)・instructions.md
    これらはブランチ依存の生成物で、かつ個人情報(スタッフ氏名・私用携帯番号)を含むため、
    .git/info/exclude(ローカル限定)で除外する運用に統一した。生成HTMLは別リポジトリ(private)へ配信。
  • 既存パターン(docs/manuals/*/*.html は既に除外)に verify_*.txt|mdinstructions.md を追加し、
    規約から外れて追跡されていた verify テキストの追加コミットは履歴から外した。

検証

  • 画像: 壊れていた2本(のぼり広告・幼稚園WARS)を --embed-only で復旧 → MB級に戻り、未埋め込み <img> は 0 件。
  • 検証レポート: 全8マニュアルで生成。のぼり広告の「要確認」本文差は 13→5 件に圧縮、構造ノイズは確認不要に分離。
  • 配信: 使い捨て git リポジトリで manuals/NN/index.html 配置・URL出力を end-to-end 確認。

Closes #281

🤖 Generated with Claude Code

https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y

Summary by CodeRabbit

  • ドキュメント

    • 解説マニュアル自動生成パイプラインの仕様書および運用手順を追加しました。Google ドキュメントからスマートフォン対応の HTML スライドを自動生成できるようになります。
  • 新機能

    • マニュアル生成プロセスの自動化を実現。品質検証機能により、生成されたコンテンツの一貫性を確認できます。

taminororo and others added 14 commits May 31, 2026 11:09
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
プロンプトを .md から読み、ツール禁止でテキスト生成。card-strict 対応。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
件数でなく文字ベース保持率と追加/消失の内訳で実態を示す決定的検証。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
スプシ駆動で 1 マニュアルを e2e 処理し、watcher が常駐ポーリングする。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLM が {{ファイル名}} プレースホルダーを使わず素のファイル名や相対パスで
<img src> を出力した場合、replace_placeholders が拾えず base64 化されなかった
(card-strict ルートで多発、のぼり広告・幼稚園WARS で画像が表示されない状態)。

- replace_placeholders を堅牢化: {{}} に加え、src の basename がローカル画像に
  一致する <img> を決定的に base64 埋め込みする(_resolve_image_src を追加)。
  既に data: の src は冪等に素通り。LLM のプレースホルダー遵守に依存しない。
- --embed-only モードを追加: LLM を呼ばず既存HTMLの画像だけを再埋め込み(壊れた
  出力の決定的な復旧用)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
再生成ループ(部門長の修正案→再生成)を、共有プロンプトを書き換えずに回すための仕組み。
マニュアルディレクトリに instructions.md があれば、その内容を「この回の追加・修正指示
(最優先)」として user プロンプト末尾に注入する。Slack の修正案をローカルにコピペ→再生成、
の運用に対応する。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
部門長(非エンジニア)が Slack スレッドでそのまま読める日本語サマリ
verify_report.card-strict.md を出力する。デザインは検証せず、文章の欠落・改変・追加だけを示す。

決定的な比較ロジック(categorize/compute_fidelity)は無改変。出力前に構造ノイズを本文差から
分離する分類層を追加した:
- 見出しの採番(2 手順)・figcaption の角括弧化([部品一覧])・末尾連番/目次ページ番号は「整形差」
- 箇条書き記号(・↔-)は表記ゆれとして fold_cosmetic で吸収
- 折りたたみの <summary> トグルラベルは HTML 前段で要素ごと除去
- 図番号(図N)は「画像・図表の差」、CJK を含まない断片(元Doc由来のパス等)は本文外として退避
- 並べ替えで反対側に実在する文は「確認不要」に振り分け

結果、部門長が見るべき実質の本文差だけが残る(のぼり広告で要確認 13→5 件)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
slide_claude.card-strict.html を別リポジトリ(private)の manuals/NN/index.html へ配置・
commit し、公開URLを出力する補助ツール。マニュアルは番号で識別(ディレクトリ名先頭から
自動推定、--number で明示も可)。公開先パス・URLは SEEFT_PAGES_REPO / SEEFT_PAGES_BASE_URL
で外部化し、リポジトリにハードコードしない。標準ライブラリのみで動く。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
スプシ自動化ではなく、SeeFT担当が手作業で回す Slackスレッド中心フロー(生成→検証→Pages公開→
Slack投稿→部門長確認→instructions.md再生成ループ→執行部最終チェック)を、具体コマンドと役割
分担で記述。automation-design.md 冒頭に、現行運用は本手順書を参照・スプシ常駐監視は当面保留、
の注記を追加。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Google Docs から Pandoc 経由でカード形式の自己完結 HTML マニュアルを生成するパイプラインを新規追加する。LLM プロンプト 3 バリアント、Claude Agent SDK ベースの生成スクリプト、AI なし機械的忠実度検証スクリプト、Pandoc AST ベースの決定的変換スクリプト、Google Sheets/Drive 連携による自動化オーケストレーター群、および関連ドキュメント一式を含む。

Changes

スライド生成コア(プロンプト・生成・検証・決定的変換)

Layer / File(s) Summary
LLM プロンプト定義(base / card / card-strict)
.claude/manual-prompt.md, .claude/manual-prompt-card.md, .claude/manual-prompt-card-strict.md
manual-prompt.md に SYSTEM_PROMPT と USER_PROMPT テンプレートを定義し、manual-prompt-card.md でカード形式固有の上書き(スクロールスナップ無効・FAB TOC・UIパターン選択ルール・TOCオーバーレイ JS/CSS 実装例)を追加し、manual-prompt-card-strict.md で文章不変ポリシー(タイポ修正禁止・同義語置換禁止・全角半角変換禁止等)を最上位制約として追加する。
generate_slide.py — Claude SDK による HTML 生成
scripts/claude-slide/generate_slide.py, scripts/claude-slide/pyproject.toml
subscription 認証の Claude Agent SDK を使い、pandoc HTML→Markdown 変換・base64 画像埋め込み・プレースホルダ置換・--embed-only 復旧モード・instructions.md 注入を実装する。プロンプトバリアントは --prompt default/card/card-strict で切り替える。
verify_slide_mechanical.py — AI なし機械的忠実度検証
scripts/claude-slide/verify_slide_mechanical.py
pandoc plain 変換・ナビ DOM 除去・表罫線ノイズ除去・NFKC 正規化を行い、多重集合差分と SequenceMatcher で追加/欠落/改変/表記ゆれをカテゴリ化し、文字ベース保持率 compute_fidelity を算出して詳細レポートと部門長向け散文レポート 2 ファイルを生成する。
deterministic-slide/convert.py — Pandoc AST 決定的変換
scripts/deterministic-slide/convert.py, scripts/deterministic-slide/pyproject.toml
Pandoc JSON AST 経由で Google Docs HTML を解析し、Google リダイレクト解決・tel リンク化・base64 画像埋め込みを行い、H2 単位の <details> カードと TOC・lightbox を含む SeeFT テンプレート HTML を LLM なしで決定的に生成する。

自動化パイプライン(Sheets/Drive 連携・オーケストレーション・配布)

Layer / File(s) Summary
sheets_client.py / drive_client.py — データアクセス層
scripts/automation/sheets_client.py, scripts/automation/drive_client.py, scripts/automation/pyproject.toml
19 列 COLUMNS スキーマを基に CSV/Sheets API を同一 I/F で扱う SheetsClientCsvBackend / SheetsBackend)と、Drive Export API で Google Doc を zip 取得して source.htmlimages/ に展開する drive_client を実装する。OAuth トークンは ~/.config/seeft-pipeline/ で共有する。
process_one.py / uploader.py — 1マニュアル生成オーケストレーター
scripts/automation/process_one.py, scripts/automation/uploader.py
ステージ3完了検証 → Drive ダウンロード → generate_slide.py subprocess 起動 → アップロード → Sheets への完了/エラーステータス書き戻しの end-to-end フローを実装する。uploaderfile:// URL を返すスタブ実装。
watcher.py / publish.py — ポーリングスケジューラと GitHub Pages 配布
scripts/automation/watcher.py, scripts/automation/publish.py
watcher.py は Sheets ステータスをポーリングして初回生成/再生成対象を抽出し process_one.py を subprocess 起動する。publish.py は生成 HTML を Pages リポジトリの manuals/<N>/index.html にコピーして git commit/push し URL を出力する。
自動化パイプライン ドキュメント群
docs/proposals/agent-sdk-usage.md, docs/proposals/manual-proposal-v4-slides/automation-design.md, docs/proposals/manual-slide-pipeline.md, docs/proposals/manual-slide-operations.md, docs/proposals/verify-mechanical-improvements.html
Claude Agent SDK 利用パターン・認証・移行手順、Phase 2 スプシ軸自動化設計・OAuth セットアップ・CSV スキーマ、パイプライン全体仕様、Slack 中心の実運用手順、機械検証改善の実測レポートをまとめる。

Sequence Diagram(s)

sequenceDiagram
  rect rgba(100, 149, 237, 0.5)
    Note over PM,watcher: Phase 2 自動化フロー
    PM->>Google Sheets: HTML生成ステータス=実行中 に設定
    watcher->>Google Sheets: ポーリング (find_pending_rows)
    Google Sheets-->>watcher: 対象マニュアル名リスト
    watcher->>process_one: subprocess 起動
  end
  rect rgba(144, 238, 144, 0.5)
    Note over process_one,generate_slide: 1マニュアル生成
    process_one->>Google Drive: export_doc_as_html_zip(doc_id)
    Google Drive-->>process_one: source.html + images/ (zip)
    process_one->>generate_slide: uv run generate_slide.py --prompt card-strict
    generate_slide->>Claude Agent SDK: query(system_prompt, user_prompt, md_content)
    Claude Agent SDK-->>generate_slide: TextBlock ストリーム
    generate_slide-->>process_one: slide_claude.card-strict.html
  end
  rect rgba(255, 165, 0, 0.5)
    Note over process_one,Google Sheets: 結果書き戻し
    process_one->>uploader: upload(html_path, key)
    uploader-->>process_one: file:// URL
    process_one->>Google Sheets: 完了/URL/最終生成日時 書き戻し
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 分

Suggested reviewers

  • uchida189

Poem

🐰 ウサギが跳ねる、マニュアルの森
Google Doc から HTML ぴょんと変換
カードが並んで、FAB がひかる
文字はそのまま、一字も変えず
AI と機械で品質守る
.card-strict の誓いと共に
✨ パイプライン完成、せーの!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR タイトルは「マニュアル解説HTMLの変換・比較・配信と実運用手順」で、PR の主要な変更内容(変換・比較・配信パイプラインの実装と運用手順の確立)を明確に要約しており、PR 全体の目的と直接対応している。
Description check ✅ Passed PR description は概要、主な変更内容をテーブル形式で整理し、運用モデル・除外対象・検証結果を詳細に説明しており、テンプレート要件の「概要」と「備考」に対応する充実した内容を提供している。
Linked Issues check ✅ Passed Issue #281 は具体的なチェックボックス形式の要件を示さず「マニュアル生成コードの実装」という高レベルの目的を示すのみだが、PR の raw_summary と description から、Google Docs → カード型HTML 変換、機械検証、運用手順確立という具体的な実装が確認でき、Issue の文意と一致している。
Out of Scope Changes check ✅ Passed 全変更は Issue #281(マニュアル生成パイプライン実装)の範囲内であり、プロンプト定義、生成/検証/配信スクリプト、操作手順書といった明確に関連する実装のみで、無関係な機能追加や修正は確認されない。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/kanba/281-manual-generate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (9)
scripts/deterministic-slide/convert.py-307-310 (1)

307-310: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

同名見出しでセクションIDが衝突し、TOC遷移が不正になります。

slugify() の結果をそのまま使っているため、同一H2タイトルが複数あると id が重複します。後続カードへのリンクが壊れるので、重複時サフィックス付与が必要です。

💡 修正案
 def split_into_sections(blocks: list, images_b64: dict) -> tuple[dict, list]:
@@
     sections: list = []
+    used_ids: dict[str, int] = {}
@@
             if level <= 2:
@@
+                base_id = slugify(title_text, len(sections) + 1)
+                used_ids[base_id] = used_ids.get(base_id, 0) + 1
+                section_id = base_id if used_ids[base_id] == 1 else f"{base_id}-{used_ids[base_id]}"
                 current = {
-                    "id": slugify(title_text, len(sections) + 1),
+                    "id": section_id,
                     "title": title_text,
                     "title_html": title_html,
                     "html_parts": [],
                 }

Also applies to: 350-356

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/deterministic-slide/convert.py` around lines 307 - 310, The slugify()
function generates identical IDs for duplicate heading titles, causing ID
collisions in the generated HTML which breaks table of contents navigation. To
fix this, implement duplicate tracking by maintaining a dictionary or counter of
previously generated slugs, and when slugify() returns a slug that has already
been used, append a numeric suffix (like "-2", "-3", etc.) to make each ID
unique. This tracking mechanism should be implemented at the call site where
slugify() is used (referenced in lines 350-356) rather than inside the slugify()
function itself, so each invocation checks against all previously generated IDs
before returning the final slug.
scripts/deterministic-slide/convert.py-54-60 (1)

54-60: ⚠️ Potential issue | 🟡 Minor

pandoc 未導入時に明示的なエラーメッセージを提供してください。

現在のコードは pandoc が見つからない場合、FileNotFoundError スタックトレースで失敗するため、原因特定に時間がかかります。事前チェックを追加し、明確なエラーメッセージを出力する方が運用上安全です。

修正案
 import argparse
 import base64
 import html
 import json
 import mimetypes
 import os
 import re
+import shutil
 import subprocess
 import sys
 from urllib.parse import parse_qs, urlparse

 def load_ast(html_path: str) -> dict:
     """pandoc -t json で HTML を AST 化。"""
+    if shutil.which("pandoc") is None:
+        raise RuntimeError("pandoc が見つかりません。PATH を確認してください。")
     result = subprocess.run(
         ["pandoc", "-f", "html", "-t", "json", html_path],
         capture_output=True, text=True, check=True,
     )
     return json.loads(result.stdout)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/deterministic-slide/convert.py` around lines 54 - 60, The load_ast
function will fail with an unhelpful FileNotFoundError when pandoc is not
installed. Add a pre-check before the subprocess.run call to verify that pandoc
is available on the system, and if it is not found, raise a clear and
descriptive error message explaining that pandoc needs to be installed. This
provides users with immediate clarity about what is missing rather than
requiring them to debug a stack trace.

Source: Linters/SAST tools

scripts/automation/process_one.py-145-145 (1)

145-145: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

不要な f-string が Ruff F541 に引っかかっています

Line 145/151/198/200 はプレースホルダのない f-string です。通常文字列に置き換えて lint エラーを解消してください。

🔧 修正案
-        print(f"ABORT: Google Doc URL が空です", file=sys.stderr)
+        print("ABORT: Google Doc URL が空です", file=sys.stderr)
@@
-        print(f"  ステータス → 実行中")
+        print("  ステータス → 実行中")
@@
-            print(f"  スプシ更新完了")
+            print("  スプシ更新完了")
@@
-        print(f"=== 完了 ===")
+        print("=== 完了 ===")

Also applies to: 151-151, 198-198, 200-200

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/process_one.py` at line 145, Remove the unnecessary
f-string prefixes from the print statements at lines 145, 151, 198, and 200.
These strings contain no placeholder variables (like {variable}), so they should
be converted to regular strings by removing the f prefix. Replace f"..." with
"..." for each of these print statements to resolve the Ruff F541 lint error.

Source: Linters/SAST tools

docs/proposals/agent-sdk-usage.md-11-21 (1)

11-21: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

コードフェンスの言語指定を追加してください。

このブロックは言語未指定のため markdownlint (MD040) に引っかかります。text などを明示してください。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/proposals/agent-sdk-usage.md` around lines 11 - 21, The code block
starting at the section comparing Anthropic SDK and Claude Agent SDK is missing
a language identifier on the opening code fence, which violates the markdownlint
MD040 rule. Add a language identifier (such as `text`) to the opening triple
backticks of this code block to explicitly specify the language type and resolve
the linting issue.

Source: Linters/SAST tools

docs/proposals/manual-slide-operations.md-22-42 (1)

22-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

フローチャートのコードフェンスに言語指定を追加してください。

このフェンスは言語未指定で MD040 対象です。text 指定で十分です。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/proposals/manual-slide-operations.md` around lines 22 - 42, The code
fence containing the flowchart diagram does not have a language specifier and is
flagged by the MD040 linting rule. Add the language identifier `text`
immediately after the opening triple backticks (```) to explicitly specify the
fence language and resolve the linting violation.

Source: Linters/SAST tools

docs/proposals/manual-proposal-v4-slides/automation-design.md-19-19 (1)

19-19: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

言語未指定のコードフェンスを統一的に修正してください。

複数箇所で MD040 が発生しています。フェンスごとに text / bash / toml などを付けると lint が安定します。

Also applies to: 72-72, 94-94, 149-149, 163-163

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/proposals/manual-proposal-v4-slides/automation-design.md` at line 19,
Multiple code fences throughout the file lack language identifiers, causing
MD040 linter violations at lines 19, 72, 94, 149, and 163. Locate each of these
unmarked code fence opening markers (the triple backticks with no language
specified) and add appropriate language identifiers such as `text`, `bash`,
`toml`, or other relevant languages based on the content enclosed within each
fence. This will ensure consistent markdown linting across the document.

Source: Linters/SAST tools

scripts/automation/publish.py-55-57 (1)

55-57: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

--number は数値のみ受け付けるバリデーションを追加してください。

現状は任意文字列が通るため、manuals/<number>/ の運用規約を崩せます。引数は数字のみ許可した方が安全です。

🔧 修正案
 def derive_number(manual_dir: str, explicit: str | None) -> str:
     """マニュアル番号を決める。--number 優先、無ければディレクトリ名先頭の数字。"""
     if explicit:
+        if not explicit.isdigit():
+            raise SystemExit("ERROR: --number は数字のみ指定してください。")
         return explicit.zfill(2)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/publish.py` around lines 55 - 57, The `explicit` variable
representing the `--number` argument is not validating that it contains only
numeric characters, which could violate the operational convention of
`manuals/<number>/` directories. Add validation before using `explicit.zfill(2)`
to check if the value contains only digits, and raise an appropriate error (such
as ValueError or exit with an error message) if the validation fails. This
ensures that only valid numeric values are accepted for the number parameter.
scripts/automation/watcher.py-52-55 (1)

52-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

first_gen 抽出でも空のマニュアル名を除外してください。

再生成依頼 側には空値ガードがありますが、実行中 側にはありません。空名行があると毎回無意味な起動を試みます。

🔧 修正案
     # トリガー 1: 実行中 かつ 最終生成日時 が空
     for row in client.find_rows_by_status("HTML生成ステータス", "実行中"):
-        if not row.get("最終生成日時"):
-            pending.append((row.get("マニュアル名"), "first_gen"))
+        name = row.get("マニュアル名")
+        if not row.get("最終生成日時") and name:
+            pending.append((name, "first_gen"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/watcher.py` around lines 52 - 55, The code in the loop
iterating through find_rows_by_status with status "実行中" lacks a guard check for
empty manual names when appending to pending. Add a condition to verify that
row.get("マニュアル名") is not empty or None before appending the tuple to pending,
similar to the guard that already exists on the "再生成依頼" side. This will prevent
meaningless task executions for rows with missing manual names.
scripts/automation/sheets_client.py-317-320 (1)

317-320: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

CSV フォールバックで空ファイル時にクラッシュします。

reader[0] を無条件参照しているため、空 CSV で IndexError が発生します。空入力は空結果で返すガードを入れてください。

🔧 修正案
         with open(self.backend.csv_path, newline="", encoding="utf-8") as f:  # type: ignore[attr-defined]
             reader = list(csv.reader(f))
+        if not reader:
+            return rows
         header = reader[0]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/sheets_client.py` around lines 317 - 320, The code in the
method reading from self.backend.csv_path accesses reader[0] unconditionally to
get the header, which will cause an IndexError if the CSV file is empty. Add a
guard clause right after creating the reader list to check if reader is empty,
and if so, return an appropriate empty result (such as an empty list or
similar). Only proceed to access header = reader[0] and check the status_column
after confirming the reader contains data.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/proposals/manual-proposal-v4-slides/automation-design.md`:
- Around line 178-200: The status CSV schema table in section "6. ステータス CSV
スキーマ(19 列)" currently defines 19 columns, but the actual implementation in
scripts/automation/sheets_client.py uses the COLUMNS constant which defines 25
columns. Since this document is marked as the single source of truth, you must
either update the table to include all 25 columns that match the COLUMNS
definition in sheets_client.py, or explicitly mark this documentation as
legacy/outdated. Ensure column order, names, types, and descriptions align with
the actual implementation to prevent column reference failures during
re-enablement.

In `@scripts/automation/drive_client.py`:
- Around line 82-83: The token.json file is being created with default
permissions, which could allow other users to read the sensitive OAuth
credentials and refresh token. After writing the credentials to the file at
TOKEN_PATH using creds.to_json() in the open statement, immediately change the
file permissions to 0600 (read/write for owner only) using os.chmod on
TOKEN_PATH to restrict access.
- Around line 67-71: The credential validation logic in the token loading
section does not check whether the existing credentials have the required
DRIVE_SCOPES, only whether they are valid and not expired. This causes failures
when a token with insufficient scopes is reused. Add a check using the
has_scopes() method on the creds object to validate that it possesses all
required DRIVE_SCOPES. Modify the condition `if not creds or not creds.valid:`
to also include a check for missing scopes, so that if scopes are insufficient,
the code falls back to re-authentication instead of proceeding with insufficient
permissions.

In `@scripts/automation/process_one.py`:
- Around line 143-146: The direct call to .strip() on the result of
row.get("Google Doc URL") at line 143 will crash with an AttributeError if the
value is None (when the key doesn't exist or returns None). Before calling
.strip(), ensure the value is normalized to a string by either providing a
default empty string to the get() method or by checking if the value is None
first. Then call .strip() on the guaranteed string value before performing the
validation check against doc_url.

In `@scripts/automation/publish.py`:
- Around line 119-121: The git commit operation fails when there are no changes
to commit, causing the script to exit abnormally even in successful scenarios.
After the git add call with rel_path, add a conditional check to verify that
there are staged differences before executing the git commit operation. Use a
git command (such as checking git diff --cached or git status) to determine if
there are actual staged changes, and only proceed with the commit if differences
exist, allowing the script to handle no-change scenarios gracefully.

In `@scripts/automation/sheets_client.py`:
- Around line 213-215: The range_name variable in the Sheets API call is
hardcoded to read only up to row 200 (A1:Y200), which causes any data beyond row
200 to be permanently skipped by the read_row and find_rows_by_status methods.
Replace the hardcoded row limit of 200 with a dynamic approach that determines
the actual number of rows to read. Keep the column range fixed at A:Y but
calculate or query the actual row count dynamically, such as by using the
getMetadata method or determining the last non-empty row, so that all rows in
the sheet are accessible for processing.

In `@scripts/automation/uploader.py`:
- Around line 45-48: The `key` variable is used directly to construct file paths
without validation, creating a path traversal vulnerability where values
containing `../` or absolute paths could write outside the target directory.
Since `key` comes from external input (Sheets values), sanitize it before
constructing the target path. At both locations (around line 46 where
target_filename is created and around line 66-67), apply path normalization to
the `key` by using os.path.basename() or by filtering out dangerous characters
like slashes and dots to ensure only a safe filename is used in the path
construction with os.path.join().

In `@scripts/automation/watcher.py`:
- Around line 95-100: The exception handling in the loop that calls
run_process_one() catches failures but does not propagate them to the exit code,
causing main() to always return 0 even when processing fails. Track whether any
errors occurred during iteration through the pending list (by setting a flag
when an exception is caught in the except block), then modify scan_once() to
return a status indicating failure rather than just the count, and update main()
to check this return value and exit with a non-zero code when errors have been
detected. This ensures the automation infrastructure can properly detect and
report abnormal execution.

In `@scripts/claude-slide/generate_slide.py`:
- Around line 217-223: The code extracts the is_error field from the
ResultMessage object in the elif isinstance(message, ResultMessage) block but
does not evaluate this error status before proceeding with the save operation.
Add a check immediately after extracting the usage dictionary to verify that
is_error is False, and if it is True, raise an exception or return an error to
prevent failure responses from being treated as successes. Apply this same error
checking logic to the other location mentioned (around lines 297-305) where
similar ResultMessage handling occurs.
- Around line 84-90: The current implementation uses os.listdir() which returns
files in non-deterministic order, causing the selection of the first matching
HTML file to be unpredictable across different runs or environments. To fix
this, modify the file selection logic to collect all HTML files that don't start
with "slide" into a list, sort that list to ensure deterministic ordering (such
as alphabetically), and then select the first item from the sorted list. This
ensures the same HTML file is selected every time the script runs, maintaining
reproducibility of the generation pipeline.

In `@scripts/claude-slide/verify_slide_mechanical.py`:
- Around line 72-81: The find_source_html function iterates through
os.listdir(manual_dir) without sorting, which returns files in non-deterministic
order depending on the filesystem and operating system. This causes different
source HTML files to be selected in different execution environments, affecting
verification reliability. Sort the result of os.listdir(manual_dir) before
iterating through it to ensure a consistent, deterministic file selection order
regardless of the execution environment.

In `@scripts/deterministic-slide/convert.py`:
- Around line 107-115: The autolink_phone function currently applies regex
substitution to the entire HTML string, which causes phone numbers already
inside HTML attributes (like href values) or within existing anchor tags to be
re-wrapped, creating invalid nested HTML. Modify the autolink_phone function to
parse the HTML structure and apply the PHONE_RE.sub pattern only to text nodes
(plain text content), not to attribute values or content already within HTML
tags. This ensures phone numbers are only linked when they appear as plain text
in the document, not when they are already part of HTML markup.
- Around line 42-51: The find_source_html function relies on the arbitrary
ordering of os.listdir() which is non-deterministic across different
environments and runs, causing it to potentially return different HTML files on
successive calls. Sort the list of files returned by os.listdir(manual_dir)
before iterating through them to ensure consistent and deterministic file
selection. Apply the sorting operation to the directory listing so that the
function always returns the same file given the same input directory.
- Around line 168-173: The RawInline handler unconditionally appends raw HTML
content to the parts list when fmt equals "html", which creates an XSS
vulnerability as malicious tags and event attributes can execute in the output.
Either implement an allowlist approach that only permits specific safe HTML tags
(filtering the raw content accordingly), or change the default behavior to
escape the HTML content before appending to parts instead of passing it through
unmodified. The fix should be applied in the RawInline case block where the fmt
and raw values are extracted from c.

---

Minor comments:
In `@docs/proposals/agent-sdk-usage.md`:
- Around line 11-21: The code block starting at the section comparing Anthropic
SDK and Claude Agent SDK is missing a language identifier on the opening code
fence, which violates the markdownlint MD040 rule. Add a language identifier
(such as `text`) to the opening triple backticks of this code block to
explicitly specify the language type and resolve the linting issue.

In `@docs/proposals/manual-proposal-v4-slides/automation-design.md`:
- Line 19: Multiple code fences throughout the file lack language identifiers,
causing MD040 linter violations at lines 19, 72, 94, 149, and 163. Locate each
of these unmarked code fence opening markers (the triple backticks with no
language specified) and add appropriate language identifiers such as `text`,
`bash`, `toml`, or other relevant languages based on the content enclosed within
each fence. This will ensure consistent markdown linting across the document.

In `@docs/proposals/manual-slide-operations.md`:
- Around line 22-42: The code fence containing the flowchart diagram does not
have a language specifier and is flagged by the MD040 linting rule. Add the
language identifier `text` immediately after the opening triple backticks (```)
to explicitly specify the fence language and resolve the linting violation.

In `@scripts/automation/process_one.py`:
- Line 145: Remove the unnecessary f-string prefixes from the print statements
at lines 145, 151, 198, and 200. These strings contain no placeholder variables
(like {variable}), so they should be converted to regular strings by removing
the f prefix. Replace f"..." with "..." for each of these print statements to
resolve the Ruff F541 lint error.

In `@scripts/automation/publish.py`:
- Around line 55-57: The `explicit` variable representing the `--number`
argument is not validating that it contains only numeric characters, which could
violate the operational convention of `manuals/<number>/` directories. Add
validation before using `explicit.zfill(2)` to check if the value contains only
digits, and raise an appropriate error (such as ValueError or exit with an error
message) if the validation fails. This ensures that only valid numeric values
are accepted for the number parameter.

In `@scripts/automation/sheets_client.py`:
- Around line 317-320: The code in the method reading from self.backend.csv_path
accesses reader[0] unconditionally to get the header, which will cause an
IndexError if the CSV file is empty. Add a guard clause right after creating the
reader list to check if reader is empty, and if so, return an appropriate empty
result (such as an empty list or similar). Only proceed to access header =
reader[0] and check the status_column after confirming the reader contains data.

In `@scripts/automation/watcher.py`:
- Around line 52-55: The code in the loop iterating through find_rows_by_status
with status "実行中" lacks a guard check for empty manual names when appending to
pending. Add a condition to verify that row.get("マニュアル名") is not empty or None
before appending the tuple to pending, similar to the guard that already exists
on the "再生成依頼" side. This will prevent meaningless task executions for rows with
missing manual names.

In `@scripts/deterministic-slide/convert.py`:
- Around line 307-310: The slugify() function generates identical IDs for
duplicate heading titles, causing ID collisions in the generated HTML which
breaks table of contents navigation. To fix this, implement duplicate tracking
by maintaining a dictionary or counter of previously generated slugs, and when
slugify() returns a slug that has already been used, append a numeric suffix
(like "-2", "-3", etc.) to make each ID unique. This tracking mechanism should
be implemented at the call site where slugify() is used (referenced in lines
350-356) rather than inside the slugify() function itself, so each invocation
checks against all previously generated IDs before returning the final slug.
- Around line 54-60: The load_ast function will fail with an unhelpful
FileNotFoundError when pandoc is not installed. Add a pre-check before the
subprocess.run call to verify that pandoc is available on the system, and if it
is not found, raise a clear and descriptive error message explaining that pandoc
needs to be installed. This provides users with immediate clarity about what is
missing rather than requiring them to debug a stack trace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 74768a80-9af2-4296-920c-cdc07f6caa2b

📥 Commits

Reviewing files that changed from the base of the PR and between 4940875 and 8c1cf2b.

⛔ Files ignored due to path filters (3)
  • scripts/automation/uv.lock is excluded by !**/*.lock
  • scripts/claude-slide/uv.lock is excluded by !**/*.lock
  • scripts/deterministic-slide/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • .claude/manual-prompt-card-strict.md
  • .claude/manual-prompt-card.md
  • .claude/manual-prompt.md
  • docs/proposals/agent-sdk-usage.md
  • docs/proposals/manual-proposal-v4-slides/automation-design.md
  • docs/proposals/manual-slide-operations.md
  • docs/proposals/manual-slide-pipeline.md
  • docs/proposals/verify-mechanical-improvements.html
  • scripts/automation/drive_client.py
  • scripts/automation/process_one.py
  • scripts/automation/publish.py
  • scripts/automation/pyproject.toml
  • scripts/automation/sheets_client.py
  • scripts/automation/uploader.py
  • scripts/automation/watcher.py
  • scripts/claude-slide/generate_slide.py
  • scripts/claude-slide/pyproject.toml
  • scripts/claude-slide/verify_slide_mechanical.py
  • scripts/deterministic-slide/convert.py
  • scripts/deterministic-slide/pyproject.toml

Comment on lines +178 to +200
## 6. ステータス CSV スキーマ(19 列)

| # | カラム名 | 型 | 記入主体 | enum 値 / 例 |
| --- | --- | --- | --- | --- |
| 1 | マニュアル名 | str | PM | "配線マニュアル" |
| 2 | 担当局 | str | PM | "総務" "渉外" "財務" "企画" |
| 3 | 担当部門 | str | PM | "会場" "副局長" "広報" 等 |
| 4 | 担当者名 | str | PM | "赤嶺" "黒木康士朗" |
| 5 | Google Doc URL | URL | PM | "https://docs.google.com/document/d/.../edit" |
| 6 | **解説HTML生成** | enum | PM | "生成する" / "生成しない" |
| 7 | 生成HTML URL | URL | automation | "https://manuals.../wiring.html" |
| 8 | パイプライン状態 | enum | automation | "未生成" "生成中" "検証中" "検証済(OK)" "検証済(NG)" "Doc 直配信" "エラー" |
| 9 | 最終生成日時 | datetime | automation | "2026-05-13 13:30" |
| 10 | 検証VERDICT | enum | automation | "OK" / "NG" / "" |
| 11 | 追加件数 | int | automation | 0, 1, 2, ... |
| 12 | 欠落件数 | int | automation | 0, 1, 2, ... |
| 13 | 改変件数 | int | automation | 0, 1, 2, ... |
| 14 | 検証レポート | path | automation | "docs/manuals/.../verify_claude.card.txt" |
| 15 | 確認担当者 | str | PM | "赤嶺" |
| 16 | 確認結果 | enum | 確認担当者 | "未確認" / "訂正OK" / "要修正" / "再生成依頼" |
| 17 | 確認備考 | str | 確認担当者 | フリーテキスト |
| 18 | 個別生成指示 | markdown | PM | "- 配線番号 ①〜㉖ はバッジで強調" |
| 19 | 備考 | str | 全員 | フリーテキスト |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

スキーマ定義(19列)が現行実装(25列)と不一致です。

本 MD は「単一正本」と書かれていますが、scripts/automation/sheets_client.pyCOLUMNS(25列)と契約がずれています。再有効化時に列参照が崩れるので、ここは現行実装に合わせて更新するか、明確に legacy 扱いへ変更してください。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/proposals/manual-proposal-v4-slides/automation-design.md` around lines
178 - 200, The status CSV schema table in section "6. ステータス CSV スキーマ(19 列)"
currently defines 19 columns, but the actual implementation in
scripts/automation/sheets_client.py uses the COLUMNS constant which defines 25
columns. Since this document is marked as the single source of truth, you must
either update the table to include all 25 columns that match the COLUMNS
definition in sheets_client.py, or explicitly mark this documentation as
legacy/outdated. Ensure column order, names, types, and descriptions align with
the actual implementation to prevent column reference failures during
re-enablement.

Comment on lines +67 to +71
if os.path.exists(TOKEN_PATH):
creds = Credentials.from_authorized_user_file(TOKEN_PATH, DRIVE_SCOPES)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read the relevant section
fd -type f -name "drive_client.py" | head -5

Repository: NUTFes/SeeFT

Length of output: 227


🏁 Script executed:

# Get file structure and check if file exists
git ls-files | grep -i drive_client

Repository: NUTFes/SeeFT

Length of output: 91


🏁 Script executed:

# Search for the specific file in scripts/automation
ls -la scripts/automation/ 2>/dev/null || echo "Directory not found"

Repository: NUTFes/SeeFT

Length of output: 676


🏁 Script executed:

cat -n scripts/automation/drive_client.py | head -80

Repository: NUTFes/SeeFT

Length of output: 3358


🏁 Script executed:

# Check imports and DRIVE_SCOPES definition
cat -n scripts/automation/drive_client.py | head -30

Repository: NUTFes/SeeFT

Length of output: 1130


🏁 Script executed:

# Check the rest of the file to see how creds is used
cat -n scripts/automation/drive_client.py | sed -n '67,90p'

Repository: NUTFes/SeeFT

Length of output: 1271


🏁 Script executed:

# Search for any usage of has_scopes in the codebase
rg "has_scopes" scripts/

Repository: NUTFes/SeeFT

Length of output: 38


🌐 Web query:

google.oauth2.credentials Credentials has_scopes method documentation

💡 Result:

The has_scopes method is part of the ReadOnlyScoped interface in the google-auth library, which various credential classes (including those in google.oauth2.credentials) implement [1][2]. Method Definition: has_scopes(scopes) Purpose: Checks whether the provided credentials possess the specified set of OAuth 2.0 scopes [3][2]. Parameters: scopes (Sequence[str]): A list or sequence of scope strings to verify [3][2]. Returns: bool: Returns True if the credentials possess all of the requested scopes, and False otherwise [3][2]. Implementation Detail: The method determines if the requested scopes are a subset of the credentials' current scopes (or default scopes if current scopes are not explicitly set) [2][4]. Warning: The library documentation notes that the result of this method is not guaranteed to be accurate if the credentials are in an invalid state (i.e., when the credentials.invalid attribute is True) [2][4][5]. Source: [3], [1], [2], [4]

Citations:


🏁 Script executed:

# Check if there are any sheets_client references to understand the shared token pattern
cat -n scripts/automation/sheets_client.py | head -40

Repository: NUTFes/SeeFT

Length of output: 1545


🏁 Script executed:

# Check if the token is indeed shared between sheets_client and drive_client
rg "TOKEN_PATH\|from_authorized_user_file" scripts/automation/ -A 2

Repository: NUTFes/SeeFT

Length of output: 38


🏁 Script executed:

# Verify the import of Credentials to confirm it's from google.oauth2.credentials
cat -n scripts/automation/drive_client.py | sed -n '59,65p'

Repository: NUTFes/SeeFT

Length of output: 414


既存トークンのスコープ不足を検知できず Drive 呼び出しが失敗する可能性があります

Line 67-71 のコードは既存トークンが「有効」であれば再認証せずそのまま進むため、Sheets 専用スコープで発行済みのトークンを共有したケースで export_doc が権限不足エラーで失敗します。has_scopes() メソッドを使用して必要なスコープの有無を検証し、不足している場合は再認証へフォールバックしてください。

修正案
     if os.path.exists(TOKEN_PATH):
         creds = Credentials.from_authorized_user_file(TOKEN_PATH, DRIVE_SCOPES)
+        if creds and not creds.has_scopes(DRIVE_SCOPES):
+            creds = None
     if not creds or not creds.valid:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/drive_client.py` around lines 67 - 71, The credential
validation logic in the token loading section does not check whether the
existing credentials have the required DRIVE_SCOPES, only whether they are valid
and not expired. This causes failures when a token with insufficient scopes is
reused. Add a check using the has_scopes() method on the creds object to
validate that it possesses all required DRIVE_SCOPES. Modify the condition `if
not creds or not creds.valid:` to also include a check for missing scopes, so
that if scopes are insufficient, the code falls back to re-authentication
instead of proceeding with insufficient permissions.

Comment on lines +82 to +83
with open(TOKEN_PATH, "w") as token:
token.write(creds.to_json())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

OAuth token を既定権限で保存しており漏えいリスクがあります

Line 82-83 で token.json を既定パーミッションで作成しているため、環境によっては他ユーザーに読まれる可能性があります。refresh token を含むため、保存直後に 0600 へ固定してください。

🔧 修正案
         with open(TOKEN_PATH, "w") as token:
             token.write(creds.to_json())
+        os.chmod(TOKEN_PATH, 0o600)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(TOKEN_PATH, "w") as token:
token.write(creds.to_json())
with open(TOKEN_PATH, "w") as token:
token.write(creds.to_json())
os.chmod(TOKEN_PATH, 0o600)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/drive_client.py` around lines 82 - 83, The token.json file
is being created with default permissions, which could allow other users to read
the sensitive OAuth credentials and refresh token. After writing the credentials
to the file at TOKEN_PATH using creds.to_json() in the open statement,
immediately change the file permissions to 0600 (read/write for owner only)
using os.chmod on TOKEN_PATH to restrict access.

Comment on lines +143 to +146
doc_url = row.get("Google Doc URL").strip()
if not doc_url and not args.skip_drive:
print(f"ABORT: Google Doc URL が空です", file=sys.stderr)
return 4

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Google Doc URL が空/None の行で即クラッシュします

Line 143 の .strip() 直呼び出しは NoneAttributeError になります。ここは try 節外なので、エラーステータス更新も走りません。先に文字列へ正規化してから判定してください。

🔧 修正案
-    doc_url = row.get("Google Doc URL").strip()
+    raw_doc_url = row.get("Google Doc URL")
+    doc_url = raw_doc_url.strip() if isinstance(raw_doc_url, str) else ""
     if not doc_url and not args.skip_drive:
         print(f"ABORT: Google Doc URL が空です", file=sys.stderr)
         return 4
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
doc_url = row.get("Google Doc URL").strip()
if not doc_url and not args.skip_drive:
print(f"ABORT: Google Doc URL が空です", file=sys.stderr)
return 4
raw_doc_url = row.get("Google Doc URL")
doc_url = raw_doc_url.strip() if isinstance(raw_doc_url, str) else ""
if not doc_url and not args.skip_drive:
print(f"ABORT: Google Doc URL が空です", file=sys.stderr)
return 4
🧰 Tools
🪛 Ruff (0.15.17)

[error] 145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/process_one.py` around lines 143 - 146, The direct call to
.strip() on the result of row.get("Google Doc URL") at line 143 will crash with
an AttributeError if the value is None (when the key doesn't exist or returns
None). Before calling .strip(), ensure the value is normalized to a string by
either providing a default empty string to the get() method or by checking if
the value is None first. Then call .strip() on the guaranteed string value
before performing the validation check against doc_url.

Comment on lines +119 to +121
git(pages_repo, "add", rel_path)
git(pages_repo, "commit", "-m", message)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

変更なし時の git commit 失敗を吸収してください。

同じ HTML を再配布すると git commit が失敗し、正常系でも異常終了します。add 後に staged 差分がある時だけ commit する分岐が必要です。

🔧 修正案
     message = args.message or f"manuals/{number}: {os.path.basename(manual_dir)} を更新"
     git(pages_repo, "add", rel_path)
-    git(pages_repo, "commit", "-m", message)
+    diff = subprocess.run(
+        ["git", "-C", pages_repo, "diff", "--cached", "--quiet", "--", rel_path]
+    )
+    if diff.returncode == 0:
+        print("  変更なしのため commit をスキップ")
+    elif diff.returncode == 1:
+        git(pages_repo, "commit", "-m", message)
+    else:
+        raise RuntimeError("git diff --cached の実行に失敗しました")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/automation/publish.py` around lines 119 - 121, The git commit
operation fails when there are no changes to commit, causing the script to exit
abnormally even in successful scenarios. After the git add call with rel_path,
add a conditional check to verify that there are staged differences before
executing the git commit operation. Use a git command (such as checking git diff
--cached or git status) to determine if there are actual staged changes, and
only proceed with the commit if differences exist, allowing the script to handle
no-change scenarios gracefully.

Comment on lines +217 to +223
elif isinstance(message, ResultMessage):
usage = {
"duration_ms": getattr(message, "duration_ms", None),
"num_turns": getattr(message, "num_turns", None),
"total_cost_usd": getattr(message, "total_cost_usd", None),
"is_error": getattr(message, "is_error", None),
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SDKエラー状態を無視して出力を書き出しています

ResultMessageis_error を取得していますが、main() で未評価のまま保存処理へ進んでいます。失敗レスポンスを成功として配布する経路になるため、エラー時は即時失敗にしてください。

💡 修正案
-    response_text, _usage = anyio.run(
+    response_text, usage = anyio.run(
         call_claude_sdk,
         system_prompt,
         user_prompt_template,
         md_content,
         image_files,
         args.model,
         extra_instructions,
     )
+    if usage.get("is_error"):
+        print(f"  ERROR: Claude Agent SDK returned is_error=True: {usage}", file=sys.stderr)
+        return 1
+    if not response_text.strip():
+        print("  ERROR: Empty response from Claude Agent SDK", file=sys.stderr)
+        return 1

Also applies to: 297-305

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/claude-slide/generate_slide.py` around lines 217 - 223, The code
extracts the is_error field from the ResultMessage object in the elif
isinstance(message, ResultMessage) block but does not evaluate this error status
before proceeding with the save operation. Add a check immediately after
extracting the usage dictionary to verify that is_error is False, and if it is
True, raise an exception or return an error to prevent failure responses from
being treated as successes. Apply this same error checking logic to the other
location mentioned (around lines 297-305) where similar ResultMessage handling
occurs.

Comment on lines +72 to +81
def find_source_html(manual_dir: str) -> str:
"""元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。"""
for f in os.listdir(manual_dir):
if (
f.endswith(".html")
and not f.startswith("slide")
and not f.startswith("verify")
):
return os.path.join(manual_dir, f)
raise FileNotFoundError(f"No source HTML found in {manual_dir}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

検証元HTMLの選択ロジックが非決定的です

候補が複数ある場合に先頭一致を返しており、実行環境によって別ファイルを検証する可能性があります。検証結果の信頼性に直結するため、選択を決定的にしてください。

💡 修正案
 def find_source_html(manual_dir: str) -> str:
     """元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。"""
-    for f in os.listdir(manual_dir):
-        if (
-            f.endswith(".html")
-            and not f.startswith("slide")
-            and not f.startswith("verify")
-        ):
-            return os.path.join(manual_dir, f)
-    raise FileNotFoundError(f"No source HTML found in {manual_dir}")
+    candidates = sorted(
+        f for f in os.listdir(manual_dir)
+        if f.endswith(".html")
+        and not f.startswith("slide")
+        and not f.startswith("verify")
+    )
+    if not candidates:
+        raise FileNotFoundError(f"No source HTML found in {manual_dir}")
+    if "source.html" in candidates:
+        return os.path.join(manual_dir, "source.html")
+    if len(candidates) == 1:
+        return os.path.join(manual_dir, candidates[0])
+    raise RuntimeError(
+        f"Ambiguous source HTML in {manual_dir}: {candidates}. "
+        "Keep exactly one source HTML (or name it source.html)."
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def find_source_html(manual_dir: str) -> str:
"""元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。"""
for f in os.listdir(manual_dir):
if (
f.endswith(".html")
and not f.startswith("slide")
and not f.startswith("verify")
):
return os.path.join(manual_dir, f)
raise FileNotFoundError(f"No source HTML found in {manual_dir}")
def find_source_html(manual_dir: str) -> str:
"""元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。"""
candidates = sorted(
f for f in os.listdir(manual_dir)
if f.endswith(".html")
and not f.startswith("slide")
and not f.startswith("verify")
)
if not candidates:
raise FileNotFoundError(f"No source HTML found in {manual_dir}")
if "source.html" in candidates:
return os.path.join(manual_dir, "source.html")
if len(candidates) == 1:
return os.path.join(manual_dir, candidates[0])
raise RuntimeError(
f"Ambiguous source HTML in {manual_dir}: {candidates}. "
"Keep exactly one source HTML (or name it source.html)."
)
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 73-73: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


[warning] 73-73: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/claude-slide/verify_slide_mechanical.py` around lines 72 - 81, The
find_source_html function iterates through os.listdir(manual_dir) without
sorting, which returns files in non-deterministic order depending on the
filesystem and operating system. This causes different source HTML files to be
selected in different execution environments, affecting verification
reliability. Sort the result of os.listdir(manual_dir) before iterating through
it to ensure a consistent, deterministic file selection order regardless of the
execution environment.

Comment on lines +42 to +51
def find_source_html(manual_dir: str) -> str:
"""元 HTML (slide_* / verify_* 以外の .html) を 1 つ返す。"""
for f in os.listdir(manual_dir):
if (
f.endswith(".html")
and not f.startswith("slide")
and not f.startswith("verify")
):
return os.path.join(manual_dir, f)
raise FileNotFoundError(f"No source HTML found in {manual_dir}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

入力HTMLの選択が非決定で、誤ファイル変換の原因になります。

os.listdir() の順序に依存して最初の .html を返しているため、候補が複数ある環境では実行ごとに別ファイルを拾う可能性があります。決定的変換の要件と衝突します。

💡 修正案
 def find_source_html(manual_dir: str) -> str:
     """元 HTML (slide_* / verify_* 以外の .html) を 1 つ返す。"""
-    for f in os.listdir(manual_dir):
-        if (
-            f.endswith(".html")
-            and not f.startswith("slide")
-            and not f.startswith("verify")
-        ):
-            return os.path.join(manual_dir, f)
-    raise FileNotFoundError(f"No source HTML found in {manual_dir}")
+    candidates = sorted(
+        f for f in os.listdir(manual_dir)
+        if f.endswith(".html")
+        and not f.startswith("slide")
+        and not f.startswith("verify")
+    )
+    if len(candidates) != 1:
+        raise FileNotFoundError(
+            f"Expected exactly one source HTML in {manual_dir}, found: {candidates}"
+        )
+    return os.path.join(manual_dir, candidates[0])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/deterministic-slide/convert.py` around lines 42 - 51, The
find_source_html function relies on the arbitrary ordering of os.listdir() which
is non-deterministic across different environments and runs, causing it to
potentially return different HTML files on successive calls. Sort the list of
files returned by os.listdir(manual_dir) before iterating through them to ensure
consistent and deterministic file selection. Apply the sorting operation to the
directory listing so that the function always returns the same file given the
same input directory.

Comment on lines +107 to +115
def autolink_phone(text: str) -> str:
"""電話番号らしき文字列を <a href="tel:..."> でラップする。
HTML 化済みのテキストを受け取る前提なので、既にタグの中にある番号は触らない (簡易判定)。"""
# 「" 又は >」の直後にある番号だけ対象にする雑な区切り。電話番号がプレーンに段落中で
# 出現する場合をターゲットにし、既に href="tel:..." の中にあるものはスキップする。
def repl(m: re.Match) -> str:
num = m.group(1)
return f'<a href="tel:{num}">{num}</a>'
return PHONE_RE.sub(repl, text)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

電話番号自動リンク化が既存タグ内部を破壊します。

Line 212 で HTML 化済み文字列に PHONE_RE.sub(...) をかけているため、href 属性値や既存 <a> 内の番号まで再ラップされ、壊れた HTML(ネストした <a> など)を生成します。
根本対応として、HTML 文字列ではなく「テキストノード」に限定してリンク化してください。

Also applies to: 208-213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/deterministic-slide/convert.py` around lines 107 - 115, The
autolink_phone function currently applies regex substitution to the entire HTML
string, which causes phone numbers already inside HTML attributes (like href
values) or within existing anchor tags to be re-wrapped, creating invalid nested
HTML. Modify the autolink_phone function to parse the HTML structure and apply
the PHONE_RE.sub pattern only to text nodes (plain text content), not to
attribute values or content already within HTML tags. This ensures phone numbers
are only linked when they appear as plain text in the document, not when they
are already part of HTML markup.

Comment on lines +168 to +173
elif t == "RawInline":
# ["html", "<raw>"] みたいなやつ。フォーマット指定が html なら通す
fmt, raw = c
if fmt == "html":
parts.append(raw)
elif t == "Note":

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

RawInline の生HTMLを無条件通過しており、公開先でXSSが成立します。

fmt == "html" をそのまま parts.append(raw) しているため、入力HTMLに悪意あるタグ/イベント属性が混入すると最終成果物で実行されます。許可タグのみ通す allowlist か、デフォルト escape に切り替えるべきです。

🔒 修正案(allowlist 例)
         elif t == "RawInline":
             # ["html", "<raw>"] みたいなやつ。フォーマット指定が html なら通す
             fmt, raw = c
             if fmt == "html":
-                parts.append(raw)
+                safe = raw.strip()
+                if re.fullmatch(r"</?(br|sub|sup)\s*/?>", safe, flags=re.IGNORECASE):
+                    parts.append(safe)
+                else:
+                    parts.append(html.escape(raw))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/deterministic-slide/convert.py` around lines 168 - 173, The RawInline
handler unconditionally appends raw HTML content to the parts list when fmt
equals "html", which creates an XSS vulnerability as malicious tags and event
attributes can execute in the output. Either implement an allowlist approach
that only permits specific safe HTML tags (filtering the raw content
accordingly), or change the default behavior to escape the HTML content before
appending to parts instead of passing it through unmodified. The fix should be
applied in the RawInline case block where the fmt and raw values are extracted
from c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

マニュアル生成のための変換、比較コードの実装

1 participant