feat: マニュアル解説HTMLの変換・比較・配信と実運用手順 (#281)#323
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
プロンプトを .md から読み、ツール禁止でテキスト生成。card-strict 対応。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
件数でなく文字ベース保持率と追加/消失の内訳で実態を示す決定的検証。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
スプシ駆動で 1 マニュアルを e2e 処理し、watcher が常駐ポーリングする。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLM が {{ファイル名}} プレースホルダーを使わず素のファイル名や相対パスで
<img src> を出力した場合、replace_placeholders が拾えず base64 化されなかった
(card-strict ルートで多発、のぼり広告・幼稚園WARS で画像が表示されない状態)。
- replace_placeholders を堅牢化: {{}} に加え、src の basename がローカル画像に
一致する <img> を決定的に base64 埋め込みする(_resolve_image_src を追加)。
既に data: の src は冪等に素通り。LLM のプレースホルダー遵守に依存しない。
- --embed-only モードを追加: LLM を呼ばず既存HTMLの画像だけを再埋め込み(壊れた
出力の決定的な復旧用)。
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
再生成ループ(部門長の修正案→再生成)を、共有プロンプトを書き換えずに回すための仕組み。 マニュアルディレクトリに instructions.md があれば、その内容を「この回の追加・修正指示 (最優先)」として user プロンプト末尾に注入する。Slack の修正案をローカルにコピペ→再生成、 の運用に対応する。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
部門長(非エンジニア)が Slack スレッドでそのまま読める日本語サマリ verify_report.card-strict.md を出力する。デザインは検証せず、文章の欠落・改変・追加だけを示す。 決定的な比較ロジック(categorize/compute_fidelity)は無改変。出力前に構造ノイズを本文差から 分離する分類層を追加した: - 見出しの採番(2 手順)・figcaption の角括弧化([部品一覧])・末尾連番/目次ページ番号は「整形差」 - 箇条書き記号(・↔-)は表記ゆれとして fold_cosmetic で吸収 - 折りたたみの <summary> トグルラベルは HTML 前段で要素ごと除去 - 図番号(図N)は「画像・図表の差」、CJK を含まない断片(元Doc由来のパス等)は本文外として退避 - 並べ替えで反対側に実在する文は「確認不要」に振り分け 結果、部門長が見るべき実質の本文差だけが残る(のぼり広告で要確認 13→5 件)。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
slide_claude.card-strict.html を別リポジトリ(private)の manuals/NN/index.html へ配置・ commit し、公開URLを出力する補助ツール。マニュアルは番号で識別(ディレクトリ名先頭から 自動推定、--number で明示も可)。公開先パス・URLは SEEFT_PAGES_REPO / SEEFT_PAGES_BASE_URL で外部化し、リポジトリにハードコードしない。標準ライブラリのみで動く。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
スプシ自動化ではなく、SeeFT担当が手作業で回す Slackスレッド中心フロー(生成→検証→Pages公開→ Slack投稿→部門長確認→instructions.md再生成ループ→執行部最終チェック)を、具体コマンドと役割 分担で記述。automation-design.md 冒頭に、現行運用は本手順書を参照・スプシ常駐監視は当面保留、 の注記を追加。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
📝 WalkthroughWalkthroughGoogle Docs から Pandoc 経由でカード形式の自己完結 HTML マニュアルを生成するパイプラインを新規追加する。LLM プロンプト 3 バリアント、Claude Agent SDK ベースの生成スクリプト、AI なし機械的忠実度検証スクリプト、Pandoc AST ベースの決定的変換スクリプト、Google Sheets/Drive 連携による自動化オーケストレーター群、および関連ドキュメント一式を含む。 Changesスライド生成コア(プロンプト・生成・検証・決定的変換)
自動化パイプライン(Sheets/Drive 連携・オーケストレーション・配布)
Sequence Diagram(s)sequenceDiagram
rect rgba(100, 149, 237, 0.5)
Note over PM,watcher: Phase 2 自動化フロー
PM->>Google Sheets: HTML生成ステータス=実行中 に設定
watcher->>Google Sheets: ポーリング (find_pending_rows)
Google Sheets-->>watcher: 対象マニュアル名リスト
watcher->>process_one: subprocess 起動
end
rect rgba(144, 238, 144, 0.5)
Note over process_one,generate_slide: 1マニュアル生成
process_one->>Google Drive: export_doc_as_html_zip(doc_id)
Google Drive-->>process_one: source.html + images/ (zip)
process_one->>generate_slide: uv run generate_slide.py --prompt card-strict
generate_slide->>Claude Agent SDK: query(system_prompt, user_prompt, md_content)
Claude Agent SDK-->>generate_slide: TextBlock ストリーム
generate_slide-->>process_one: slide_claude.card-strict.html
end
rect rgba(255, 165, 0, 0.5)
Note over process_one,Google Sheets: 結果書き戻し
process_one->>uploader: upload(html_path, key)
uploader-->>process_one: file:// URL
process_one->>Google Sheets: 完了/URL/最終生成日時 書き戻し
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 分 Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (9)
scripts/deterministic-slide/convert.py-307-310 (1)
307-310:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win同名見出しでセクションIDが衝突し、TOC遷移が不正になります。
slugify()の結果をそのまま使っているため、同一H2タイトルが複数あるとidが重複します。後続カードへのリンクが壊れるので、重複時サフィックス付与が必要です。💡 修正案
def split_into_sections(blocks: list, images_b64: dict) -> tuple[dict, list]: @@ sections: list = [] + used_ids: dict[str, int] = {} @@ if level <= 2: @@ + base_id = slugify(title_text, len(sections) + 1) + used_ids[base_id] = used_ids.get(base_id, 0) + 1 + section_id = base_id if used_ids[base_id] == 1 else f"{base_id}-{used_ids[base_id]}" current = { - "id": slugify(title_text, len(sections) + 1), + "id": section_id, "title": title_text, "title_html": title_html, "html_parts": [], }Also applies to: 350-356
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/deterministic-slide/convert.py` around lines 307 - 310, The slugify() function generates identical IDs for duplicate heading titles, causing ID collisions in the generated HTML which breaks table of contents navigation. To fix this, implement duplicate tracking by maintaining a dictionary or counter of previously generated slugs, and when slugify() returns a slug that has already been used, append a numeric suffix (like "-2", "-3", etc.) to make each ID unique. This tracking mechanism should be implemented at the call site where slugify() is used (referenced in lines 350-356) rather than inside the slugify() function itself, so each invocation checks against all previously generated IDs before returning the final slug.scripts/deterministic-slide/convert.py-54-60 (1)
54-60:⚠️ Potential issue | 🟡 Minorpandoc 未導入時に明示的なエラーメッセージを提供してください。
現在のコードは pandoc が見つからない場合、
FileNotFoundErrorスタックトレースで失敗するため、原因特定に時間がかかります。事前チェックを追加し、明確なエラーメッセージを出力する方が運用上安全です。修正案
import argparse import base64 import html import json import mimetypes import os import re +import shutil import subprocess import sys from urllib.parse import parse_qs, urlparse def load_ast(html_path: str) -> dict: """pandoc -t json で HTML を AST 化。""" + if shutil.which("pandoc") is None: + raise RuntimeError("pandoc が見つかりません。PATH を確認してください。") result = subprocess.run( ["pandoc", "-f", "html", "-t", "json", html_path], capture_output=True, text=True, check=True, ) return json.loads(result.stdout)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/deterministic-slide/convert.py` around lines 54 - 60, The load_ast function will fail with an unhelpful FileNotFoundError when pandoc is not installed. Add a pre-check before the subprocess.run call to verify that pandoc is available on the system, and if it is not found, raise a clear and descriptive error message explaining that pandoc needs to be installed. This provides users with immediate clarity about what is missing rather than requiring them to debug a stack trace.Source: Linters/SAST tools
scripts/automation/process_one.py-145-145 (1)
145-145:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win不要な f-string が Ruff F541 に引っかかっています
Line 145/151/198/200 はプレースホルダのない f-string です。通常文字列に置き換えて lint エラーを解消してください。
🔧 修正案
- print(f"ABORT: Google Doc URL が空です", file=sys.stderr) + print("ABORT: Google Doc URL が空です", file=sys.stderr) @@ - print(f" ステータス → 実行中") + print(" ステータス → 実行中") @@ - print(f" スプシ更新完了") + print(" スプシ更新完了") @@ - print(f"=== 完了 ===") + print("=== 完了 ===")Also applies to: 151-151, 198-198, 200-200
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/automation/process_one.py` at line 145, Remove the unnecessary f-string prefixes from the print statements at lines 145, 151, 198, and 200. These strings contain no placeholder variables (like {variable}), so they should be converted to regular strings by removing the f prefix. Replace f"..." with "..." for each of these print statements to resolve the Ruff F541 lint error.Source: Linters/SAST tools
docs/proposals/agent-sdk-usage.md-11-21 (1)
11-21:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winコードフェンスの言語指定を追加してください。
このブロックは言語未指定のため markdownlint (MD040) に引っかかります。
textなどを明示してください。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/proposals/agent-sdk-usage.md` around lines 11 - 21, The code block starting at the section comparing Anthropic SDK and Claude Agent SDK is missing a language identifier on the opening code fence, which violates the markdownlint MD040 rule. Add a language identifier (such as `text`) to the opening triple backticks of this code block to explicitly specify the language type and resolve the linting issue.Source: Linters/SAST tools
docs/proposals/manual-slide-operations.md-22-42 (1)
22-42:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winフローチャートのコードフェンスに言語指定を追加してください。
このフェンスは言語未指定で MD040 対象です。
text指定で十分です。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/proposals/manual-slide-operations.md` around lines 22 - 42, The code fence containing the flowchart diagram does not have a language specifier and is flagged by the MD040 linting rule. Add the language identifier `text` immediately after the opening triple backticks (```) to explicitly specify the fence language and resolve the linting violation.Source: Linters/SAST tools
docs/proposals/manual-proposal-v4-slides/automation-design.md-19-19 (1)
19-19:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win言語未指定のコードフェンスを統一的に修正してください。
複数箇所で MD040 が発生しています。フェンスごとに
text/bash/tomlなどを付けると lint が安定します。Also applies to: 72-72, 94-94, 149-149, 163-163
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/proposals/manual-proposal-v4-slides/automation-design.md` at line 19, Multiple code fences throughout the file lack language identifiers, causing MD040 linter violations at lines 19, 72, 94, 149, and 163. Locate each of these unmarked code fence opening markers (the triple backticks with no language specified) and add appropriate language identifiers such as `text`, `bash`, `toml`, or other relevant languages based on the content enclosed within each fence. This will ensure consistent markdown linting across the document.Source: Linters/SAST tools
scripts/automation/publish.py-55-57 (1)
55-57:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
--numberは数値のみ受け付けるバリデーションを追加してください。現状は任意文字列が通るため、
manuals/<number>/の運用規約を崩せます。引数は数字のみ許可した方が安全です。🔧 修正案
def derive_number(manual_dir: str, explicit: str | None) -> str: """マニュアル番号を決める。--number 優先、無ければディレクトリ名先頭の数字。""" if explicit: + if not explicit.isdigit(): + raise SystemExit("ERROR: --number は数字のみ指定してください。") return explicit.zfill(2)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/automation/publish.py` around lines 55 - 57, The `explicit` variable representing the `--number` argument is not validating that it contains only numeric characters, which could violate the operational convention of `manuals/<number>/` directories. Add validation before using `explicit.zfill(2)` to check if the value contains only digits, and raise an appropriate error (such as ValueError or exit with an error message) if the validation fails. This ensures that only valid numeric values are accepted for the number parameter.scripts/automation/watcher.py-52-55 (1)
52-55:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winfirst_gen 抽出でも空のマニュアル名を除外してください。
再生成依頼側には空値ガードがありますが、実行中側にはありません。空名行があると毎回無意味な起動を試みます。🔧 修正案
# トリガー 1: 実行中 かつ 最終生成日時 が空 for row in client.find_rows_by_status("HTML生成ステータス", "実行中"): - if not row.get("最終生成日時"): - pending.append((row.get("マニュアル名"), "first_gen")) + name = row.get("マニュアル名") + if not row.get("最終生成日時") and name: + pending.append((name, "first_gen"))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/automation/watcher.py` around lines 52 - 55, The code in the loop iterating through find_rows_by_status with status "実行中" lacks a guard check for empty manual names when appending to pending. Add a condition to verify that row.get("マニュアル名") is not empty or None before appending the tuple to pending, similar to the guard that already exists on the "再生成依頼" side. This will prevent meaningless task executions for rows with missing manual names.scripts/automation/sheets_client.py-317-320 (1)
317-320:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCSV フォールバックで空ファイル時にクラッシュします。
reader[0]を無条件参照しているため、空 CSV でIndexErrorが発生します。空入力は空結果で返すガードを入れてください。🔧 修正案
with open(self.backend.csv_path, newline="", encoding="utf-8") as f: # type: ignore[attr-defined] reader = list(csv.reader(f)) + if not reader: + return rows header = reader[0]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/automation/sheets_client.py` around lines 317 - 320, The code in the method reading from self.backend.csv_path accesses reader[0] unconditionally to get the header, which will cause an IndexError if the CSV file is empty. Add a guard clause right after creating the reader list to check if reader is empty, and if so, return an appropriate empty result (such as an empty list or similar). Only proceed to access header = reader[0] and check the status_column after confirming the reader contains data.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/proposals/manual-proposal-v4-slides/automation-design.md`:
- Around line 178-200: The status CSV schema table in section "6. ステータス CSV
スキーマ(19 列)" currently defines 19 columns, but the actual implementation in
scripts/automation/sheets_client.py uses the COLUMNS constant which defines 25
columns. Since this document is marked as the single source of truth, you must
either update the table to include all 25 columns that match the COLUMNS
definition in sheets_client.py, or explicitly mark this documentation as
legacy/outdated. Ensure column order, names, types, and descriptions align with
the actual implementation to prevent column reference failures during
re-enablement.
In `@scripts/automation/drive_client.py`:
- Around line 82-83: The token.json file is being created with default
permissions, which could allow other users to read the sensitive OAuth
credentials and refresh token. After writing the credentials to the file at
TOKEN_PATH using creds.to_json() in the open statement, immediately change the
file permissions to 0600 (read/write for owner only) using os.chmod on
TOKEN_PATH to restrict access.
- Around line 67-71: The credential validation logic in the token loading
section does not check whether the existing credentials have the required
DRIVE_SCOPES, only whether they are valid and not expired. This causes failures
when a token with insufficient scopes is reused. Add a check using the
has_scopes() method on the creds object to validate that it possesses all
required DRIVE_SCOPES. Modify the condition `if not creds or not creds.valid:`
to also include a check for missing scopes, so that if scopes are insufficient,
the code falls back to re-authentication instead of proceeding with insufficient
permissions.
In `@scripts/automation/process_one.py`:
- Around line 143-146: The direct call to .strip() on the result of
row.get("Google Doc URL") at line 143 will crash with an AttributeError if the
value is None (when the key doesn't exist or returns None). Before calling
.strip(), ensure the value is normalized to a string by either providing a
default empty string to the get() method or by checking if the value is None
first. Then call .strip() on the guaranteed string value before performing the
validation check against doc_url.
In `@scripts/automation/publish.py`:
- Around line 119-121: The git commit operation fails when there are no changes
to commit, causing the script to exit abnormally even in successful scenarios.
After the git add call with rel_path, add a conditional check to verify that
there are staged differences before executing the git commit operation. Use a
git command (such as checking git diff --cached or git status) to determine if
there are actual staged changes, and only proceed with the commit if differences
exist, allowing the script to handle no-change scenarios gracefully.
In `@scripts/automation/sheets_client.py`:
- Around line 213-215: The range_name variable in the Sheets API call is
hardcoded to read only up to row 200 (A1:Y200), which causes any data beyond row
200 to be permanently skipped by the read_row and find_rows_by_status methods.
Replace the hardcoded row limit of 200 with a dynamic approach that determines
the actual number of rows to read. Keep the column range fixed at A:Y but
calculate or query the actual row count dynamically, such as by using the
getMetadata method or determining the last non-empty row, so that all rows in
the sheet are accessible for processing.
In `@scripts/automation/uploader.py`:
- Around line 45-48: The `key` variable is used directly to construct file paths
without validation, creating a path traversal vulnerability where values
containing `../` or absolute paths could write outside the target directory.
Since `key` comes from external input (Sheets values), sanitize it before
constructing the target path. At both locations (around line 46 where
target_filename is created and around line 66-67), apply path normalization to
the `key` by using os.path.basename() or by filtering out dangerous characters
like slashes and dots to ensure only a safe filename is used in the path
construction with os.path.join().
In `@scripts/automation/watcher.py`:
- Around line 95-100: The exception handling in the loop that calls
run_process_one() catches failures but does not propagate them to the exit code,
causing main() to always return 0 even when processing fails. Track whether any
errors occurred during iteration through the pending list (by setting a flag
when an exception is caught in the except block), then modify scan_once() to
return a status indicating failure rather than just the count, and update main()
to check this return value and exit with a non-zero code when errors have been
detected. This ensures the automation infrastructure can properly detect and
report abnormal execution.
In `@scripts/claude-slide/generate_slide.py`:
- Around line 217-223: The code extracts the is_error field from the
ResultMessage object in the elif isinstance(message, ResultMessage) block but
does not evaluate this error status before proceeding with the save operation.
Add a check immediately after extracting the usage dictionary to verify that
is_error is False, and if it is True, raise an exception or return an error to
prevent failure responses from being treated as successes. Apply this same error
checking logic to the other location mentioned (around lines 297-305) where
similar ResultMessage handling occurs.
- Around line 84-90: The current implementation uses os.listdir() which returns
files in non-deterministic order, causing the selection of the first matching
HTML file to be unpredictable across different runs or environments. To fix
this, modify the file selection logic to collect all HTML files that don't start
with "slide" into a list, sort that list to ensure deterministic ordering (such
as alphabetically), and then select the first item from the sorted list. This
ensures the same HTML file is selected every time the script runs, maintaining
reproducibility of the generation pipeline.
In `@scripts/claude-slide/verify_slide_mechanical.py`:
- Around line 72-81: The find_source_html function iterates through
os.listdir(manual_dir) without sorting, which returns files in non-deterministic
order depending on the filesystem and operating system. This causes different
source HTML files to be selected in different execution environments, affecting
verification reliability. Sort the result of os.listdir(manual_dir) before
iterating through it to ensure a consistent, deterministic file selection order
regardless of the execution environment.
In `@scripts/deterministic-slide/convert.py`:
- Around line 107-115: The autolink_phone function currently applies regex
substitution to the entire HTML string, which causes phone numbers already
inside HTML attributes (like href values) or within existing anchor tags to be
re-wrapped, creating invalid nested HTML. Modify the autolink_phone function to
parse the HTML structure and apply the PHONE_RE.sub pattern only to text nodes
(plain text content), not to attribute values or content already within HTML
tags. This ensures phone numbers are only linked when they appear as plain text
in the document, not when they are already part of HTML markup.
- Around line 42-51: The find_source_html function relies on the arbitrary
ordering of os.listdir() which is non-deterministic across different
environments and runs, causing it to potentially return different HTML files on
successive calls. Sort the list of files returned by os.listdir(manual_dir)
before iterating through them to ensure consistent and deterministic file
selection. Apply the sorting operation to the directory listing so that the
function always returns the same file given the same input directory.
- Around line 168-173: The RawInline handler unconditionally appends raw HTML
content to the parts list when fmt equals "html", which creates an XSS
vulnerability as malicious tags and event attributes can execute in the output.
Either implement an allowlist approach that only permits specific safe HTML tags
(filtering the raw content accordingly), or change the default behavior to
escape the HTML content before appending to parts instead of passing it through
unmodified. The fix should be applied in the RawInline case block where the fmt
and raw values are extracted from c.
---
Minor comments:
In `@docs/proposals/agent-sdk-usage.md`:
- Around line 11-21: The code block starting at the section comparing Anthropic
SDK and Claude Agent SDK is missing a language identifier on the opening code
fence, which violates the markdownlint MD040 rule. Add a language identifier
(such as `text`) to the opening triple backticks of this code block to
explicitly specify the language type and resolve the linting issue.
In `@docs/proposals/manual-proposal-v4-slides/automation-design.md`:
- Line 19: Multiple code fences throughout the file lack language identifiers,
causing MD040 linter violations at lines 19, 72, 94, 149, and 163. Locate each
of these unmarked code fence opening markers (the triple backticks with no
language specified) and add appropriate language identifiers such as `text`,
`bash`, `toml`, or other relevant languages based on the content enclosed within
each fence. This will ensure consistent markdown linting across the document.
In `@docs/proposals/manual-slide-operations.md`:
- Around line 22-42: The code fence containing the flowchart diagram does not
have a language specifier and is flagged by the MD040 linting rule. Add the
language identifier `text` immediately after the opening triple backticks (```)
to explicitly specify the fence language and resolve the linting violation.
In `@scripts/automation/process_one.py`:
- Line 145: Remove the unnecessary f-string prefixes from the print statements
at lines 145, 151, 198, and 200. These strings contain no placeholder variables
(like {variable}), so they should be converted to regular strings by removing
the f prefix. Replace f"..." with "..." for each of these print statements to
resolve the Ruff F541 lint error.
In `@scripts/automation/publish.py`:
- Around line 55-57: The `explicit` variable representing the `--number`
argument is not validating that it contains only numeric characters, which could
violate the operational convention of `manuals/<number>/` directories. Add
validation before using `explicit.zfill(2)` to check if the value contains only
digits, and raise an appropriate error (such as ValueError or exit with an error
message) if the validation fails. This ensures that only valid numeric values
are accepted for the number parameter.
In `@scripts/automation/sheets_client.py`:
- Around line 317-320: The code in the method reading from self.backend.csv_path
accesses reader[0] unconditionally to get the header, which will cause an
IndexError if the CSV file is empty. Add a guard clause right after creating the
reader list to check if reader is empty, and if so, return an appropriate empty
result (such as an empty list or similar). Only proceed to access header =
reader[0] and check the status_column after confirming the reader contains data.
In `@scripts/automation/watcher.py`:
- Around line 52-55: The code in the loop iterating through find_rows_by_status
with status "実行中" lacks a guard check for empty manual names when appending to
pending. Add a condition to verify that row.get("マニュアル名") is not empty or None
before appending the tuple to pending, similar to the guard that already exists
on the "再生成依頼" side. This will prevent meaningless task executions for rows with
missing manual names.
In `@scripts/deterministic-slide/convert.py`:
- Around line 307-310: The slugify() function generates identical IDs for
duplicate heading titles, causing ID collisions in the generated HTML which
breaks table of contents navigation. To fix this, implement duplicate tracking
by maintaining a dictionary or counter of previously generated slugs, and when
slugify() returns a slug that has already been used, append a numeric suffix
(like "-2", "-3", etc.) to make each ID unique. This tracking mechanism should
be implemented at the call site where slugify() is used (referenced in lines
350-356) rather than inside the slugify() function itself, so each invocation
checks against all previously generated IDs before returning the final slug.
- Around line 54-60: The load_ast function will fail with an unhelpful
FileNotFoundError when pandoc is not installed. Add a pre-check before the
subprocess.run call to verify that pandoc is available on the system, and if it
is not found, raise a clear and descriptive error message explaining that pandoc
needs to be installed. This provides users with immediate clarity about what is
missing rather than requiring them to debug a stack trace.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 74768a80-9af2-4296-920c-cdc07f6caa2b
⛔ Files ignored due to path filters (3)
scripts/automation/uv.lockis excluded by!**/*.lockscripts/claude-slide/uv.lockis excluded by!**/*.lockscripts/deterministic-slide/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
.claude/manual-prompt-card-strict.md.claude/manual-prompt-card.md.claude/manual-prompt.mddocs/proposals/agent-sdk-usage.mddocs/proposals/manual-proposal-v4-slides/automation-design.mddocs/proposals/manual-slide-operations.mddocs/proposals/manual-slide-pipeline.mddocs/proposals/verify-mechanical-improvements.htmlscripts/automation/drive_client.pyscripts/automation/process_one.pyscripts/automation/publish.pyscripts/automation/pyproject.tomlscripts/automation/sheets_client.pyscripts/automation/uploader.pyscripts/automation/watcher.pyscripts/claude-slide/generate_slide.pyscripts/claude-slide/pyproject.tomlscripts/claude-slide/verify_slide_mechanical.pyscripts/deterministic-slide/convert.pyscripts/deterministic-slide/pyproject.toml
| ## 6. ステータス CSV スキーマ(19 列) | ||
|
|
||
| | # | カラム名 | 型 | 記入主体 | enum 値 / 例 | | ||
| | --- | --- | --- | --- | --- | | ||
| | 1 | マニュアル名 | str | PM | "配線マニュアル" | | ||
| | 2 | 担当局 | str | PM | "総務" "渉外" "財務" "企画" | | ||
| | 3 | 担当部門 | str | PM | "会場" "副局長" "広報" 等 | | ||
| | 4 | 担当者名 | str | PM | "赤嶺" "黒木康士朗" | | ||
| | 5 | Google Doc URL | URL | PM | "https://docs.google.com/document/d/.../edit" | | ||
| | 6 | **解説HTML生成** | enum | PM | "生成する" / "生成しない" | | ||
| | 7 | 生成HTML URL | URL | automation | "https://manuals.../wiring.html" | | ||
| | 8 | パイプライン状態 | enum | automation | "未生成" "生成中" "検証中" "検証済(OK)" "検証済(NG)" "Doc 直配信" "エラー" | | ||
| | 9 | 最終生成日時 | datetime | automation | "2026-05-13 13:30" | | ||
| | 10 | 検証VERDICT | enum | automation | "OK" / "NG" / "" | | ||
| | 11 | 追加件数 | int | automation | 0, 1, 2, ... | | ||
| | 12 | 欠落件数 | int | automation | 0, 1, 2, ... | | ||
| | 13 | 改変件数 | int | automation | 0, 1, 2, ... | | ||
| | 14 | 検証レポート | path | automation | "docs/manuals/.../verify_claude.card.txt" | | ||
| | 15 | 確認担当者 | str | PM | "赤嶺" | | ||
| | 16 | 確認結果 | enum | 確認担当者 | "未確認" / "訂正OK" / "要修正" / "再生成依頼" | | ||
| | 17 | 確認備考 | str | 確認担当者 | フリーテキスト | | ||
| | 18 | 個別生成指示 | markdown | PM | "- 配線番号 ①〜㉖ はバッジで強調" | | ||
| | 19 | 備考 | str | 全員 | フリーテキスト | |
There was a problem hiding this comment.
スキーマ定義(19列)が現行実装(25列)と不一致です。
本 MD は「単一正本」と書かれていますが、scripts/automation/sheets_client.py の COLUMNS(25列)と契約がずれています。再有効化時に列参照が崩れるので、ここは現行実装に合わせて更新するか、明確に legacy 扱いへ変更してください。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/proposals/manual-proposal-v4-slides/automation-design.md` around lines
178 - 200, The status CSV schema table in section "6. ステータス CSV スキーマ(19 列)"
currently defines 19 columns, but the actual implementation in
scripts/automation/sheets_client.py uses the COLUMNS constant which defines 25
columns. Since this document is marked as the single source of truth, you must
either update the table to include all 25 columns that match the COLUMNS
definition in sheets_client.py, or explicitly mark this documentation as
legacy/outdated. Ensure column order, names, types, and descriptions align with
the actual implementation to prevent column reference failures during
re-enablement.
| if os.path.exists(TOKEN_PATH): | ||
| creds = Credentials.from_authorized_user_file(TOKEN_PATH, DRIVE_SCOPES) | ||
| if not creds or not creds.valid: | ||
| if creds and creds.expired and creds.refresh_token: | ||
| creds.refresh(Request()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and read the relevant section
fd -type f -name "drive_client.py" | head -5Repository: NUTFes/SeeFT
Length of output: 227
🏁 Script executed:
# Get file structure and check if file exists
git ls-files | grep -i drive_clientRepository: NUTFes/SeeFT
Length of output: 91
🏁 Script executed:
# Search for the specific file in scripts/automation
ls -la scripts/automation/ 2>/dev/null || echo "Directory not found"Repository: NUTFes/SeeFT
Length of output: 676
🏁 Script executed:
cat -n scripts/automation/drive_client.py | head -80Repository: NUTFes/SeeFT
Length of output: 3358
🏁 Script executed:
# Check imports and DRIVE_SCOPES definition
cat -n scripts/automation/drive_client.py | head -30Repository: NUTFes/SeeFT
Length of output: 1130
🏁 Script executed:
# Check the rest of the file to see how creds is used
cat -n scripts/automation/drive_client.py | sed -n '67,90p'Repository: NUTFes/SeeFT
Length of output: 1271
🏁 Script executed:
# Search for any usage of has_scopes in the codebase
rg "has_scopes" scripts/Repository: NUTFes/SeeFT
Length of output: 38
🌐 Web query:
google.oauth2.credentials Credentials has_scopes method documentation
💡 Result:
The has_scopes method is part of the ReadOnlyScoped interface in the google-auth library, which various credential classes (including those in google.oauth2.credentials) implement [1][2]. Method Definition: has_scopes(scopes) Purpose: Checks whether the provided credentials possess the specified set of OAuth 2.0 scopes [3][2]. Parameters: scopes (Sequence[str]): A list or sequence of scope strings to verify [3][2]. Returns: bool: Returns True if the credentials possess all of the requested scopes, and False otherwise [3][2]. Implementation Detail: The method determines if the requested scopes are a subset of the credentials' current scopes (or default scopes if current scopes are not explicitly set) [2][4]. Warning: The library documentation notes that the result of this method is not guaranteed to be accurate if the credentials are in an invalid state (i.e., when the credentials.invalid attribute is True) [2][4][5]. Source: [3], [1], [2], [4]
Citations:
- 1: https://google-auth.readthedocs.io/en/latest/reference/google.auth.credentials.html
- 2: https://github.com/googleapis/google-auth-library-python/blob/main/google/auth/credentials.py
- 3: https://googleapis.dev/python/google-auth/latest/reference/google.oauth2.credentials.html
- 4: https://google-auth.readthedocs.io/en/stable/%5Fmodules/google/auth/credentials.html
- 5: https://googleapis.dev/python/google-auth/2.28.0/_modules/google/auth/credentials.html
🏁 Script executed:
# Check if there are any sheets_client references to understand the shared token pattern
cat -n scripts/automation/sheets_client.py | head -40Repository: NUTFes/SeeFT
Length of output: 1545
🏁 Script executed:
# Check if the token is indeed shared between sheets_client and drive_client
rg "TOKEN_PATH\|from_authorized_user_file" scripts/automation/ -A 2Repository: NUTFes/SeeFT
Length of output: 38
🏁 Script executed:
# Verify the import of Credentials to confirm it's from google.oauth2.credentials
cat -n scripts/automation/drive_client.py | sed -n '59,65p'Repository: NUTFes/SeeFT
Length of output: 414
既存トークンのスコープ不足を検知できず Drive 呼び出しが失敗する可能性があります
Line 67-71 のコードは既存トークンが「有効」であれば再認証せずそのまま進むため、Sheets 専用スコープで発行済みのトークンを共有したケースで export_doc が権限不足エラーで失敗します。has_scopes() メソッドを使用して必要なスコープの有無を検証し、不足している場合は再認証へフォールバックしてください。
修正案
if os.path.exists(TOKEN_PATH):
creds = Credentials.from_authorized_user_file(TOKEN_PATH, DRIVE_SCOPES)
+ if creds and not creds.has_scopes(DRIVE_SCOPES):
+ creds = None
if not creds or not creds.valid:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/automation/drive_client.py` around lines 67 - 71, The credential
validation logic in the token loading section does not check whether the
existing credentials have the required DRIVE_SCOPES, only whether they are valid
and not expired. This causes failures when a token with insufficient scopes is
reused. Add a check using the has_scopes() method on the creds object to
validate that it possesses all required DRIVE_SCOPES. Modify the condition `if
not creds or not creds.valid:` to also include a check for missing scopes, so
that if scopes are insufficient, the code falls back to re-authentication
instead of proceeding with insufficient permissions.
| with open(TOKEN_PATH, "w") as token: | ||
| token.write(creds.to_json()) |
There was a problem hiding this comment.
OAuth token を既定権限で保存しており漏えいリスクがあります
Line 82-83 で token.json を既定パーミッションで作成しているため、環境によっては他ユーザーに読まれる可能性があります。refresh token を含むため、保存直後に 0600 へ固定してください。
🔧 修正案
with open(TOKEN_PATH, "w") as token:
token.write(creds.to_json())
+ os.chmod(TOKEN_PATH, 0o600)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with open(TOKEN_PATH, "w") as token: | |
| token.write(creds.to_json()) | |
| with open(TOKEN_PATH, "w") as token: | |
| token.write(creds.to_json()) | |
| os.chmod(TOKEN_PATH, 0o600) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/automation/drive_client.py` around lines 82 - 83, The token.json file
is being created with default permissions, which could allow other users to read
the sensitive OAuth credentials and refresh token. After writing the credentials
to the file at TOKEN_PATH using creds.to_json() in the open statement,
immediately change the file permissions to 0600 (read/write for owner only)
using os.chmod on TOKEN_PATH to restrict access.
| doc_url = row.get("Google Doc URL").strip() | ||
| if not doc_url and not args.skip_drive: | ||
| print(f"ABORT: Google Doc URL が空です", file=sys.stderr) | ||
| return 4 |
There was a problem hiding this comment.
Google Doc URL が空/None の行で即クラッシュします
Line 143 の .strip() 直呼び出しは None で AttributeError になります。ここは try 節外なので、エラーステータス更新も走りません。先に文字列へ正規化してから判定してください。
🔧 修正案
- doc_url = row.get("Google Doc URL").strip()
+ raw_doc_url = row.get("Google Doc URL")
+ doc_url = raw_doc_url.strip() if isinstance(raw_doc_url, str) else ""
if not doc_url and not args.skip_drive:
print(f"ABORT: Google Doc URL が空です", file=sys.stderr)
return 4📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| doc_url = row.get("Google Doc URL").strip() | |
| if not doc_url and not args.skip_drive: | |
| print(f"ABORT: Google Doc URL が空です", file=sys.stderr) | |
| return 4 | |
| raw_doc_url = row.get("Google Doc URL") | |
| doc_url = raw_doc_url.strip() if isinstance(raw_doc_url, str) else "" | |
| if not doc_url and not args.skip_drive: | |
| print(f"ABORT: Google Doc URL が空です", file=sys.stderr) | |
| return 4 |
🧰 Tools
🪛 Ruff (0.15.17)
[error] 145-145: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/automation/process_one.py` around lines 143 - 146, The direct call to
.strip() on the result of row.get("Google Doc URL") at line 143 will crash with
an AttributeError if the value is None (when the key doesn't exist or returns
None). Before calling .strip(), ensure the value is normalized to a string by
either providing a default empty string to the get() method or by checking if
the value is None first. Then call .strip() on the guaranteed string value
before performing the validation check against doc_url.
| git(pages_repo, "add", rel_path) | ||
| git(pages_repo, "commit", "-m", message) | ||
|
|
There was a problem hiding this comment.
変更なし時の git commit 失敗を吸収してください。
同じ HTML を再配布すると git commit が失敗し、正常系でも異常終了します。add 後に staged 差分がある時だけ commit する分岐が必要です。
🔧 修正案
message = args.message or f"manuals/{number}: {os.path.basename(manual_dir)} を更新"
git(pages_repo, "add", rel_path)
- git(pages_repo, "commit", "-m", message)
+ diff = subprocess.run(
+ ["git", "-C", pages_repo, "diff", "--cached", "--quiet", "--", rel_path]
+ )
+ if diff.returncode == 0:
+ print(" 変更なしのため commit をスキップ")
+ elif diff.returncode == 1:
+ git(pages_repo, "commit", "-m", message)
+ else:
+ raise RuntimeError("git diff --cached の実行に失敗しました")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/automation/publish.py` around lines 119 - 121, The git commit
operation fails when there are no changes to commit, causing the script to exit
abnormally even in successful scenarios. After the git add call with rel_path,
add a conditional check to verify that there are staged differences before
executing the git commit operation. Use a git command (such as checking git diff
--cached or git status) to determine if there are actual staged changes, and
only proceed with the commit if differences exist, allowing the script to handle
no-change scenarios gracefully.
| elif isinstance(message, ResultMessage): | ||
| usage = { | ||
| "duration_ms": getattr(message, "duration_ms", None), | ||
| "num_turns": getattr(message, "num_turns", None), | ||
| "total_cost_usd": getattr(message, "total_cost_usd", None), | ||
| "is_error": getattr(message, "is_error", None), | ||
| } |
There was a problem hiding this comment.
SDKエラー状態を無視して出力を書き出しています
ResultMessage の is_error を取得していますが、main() で未評価のまま保存処理へ進んでいます。失敗レスポンスを成功として配布する経路になるため、エラー時は即時失敗にしてください。
💡 修正案
- response_text, _usage = anyio.run(
+ response_text, usage = anyio.run(
call_claude_sdk,
system_prompt,
user_prompt_template,
md_content,
image_files,
args.model,
extra_instructions,
)
+ if usage.get("is_error"):
+ print(f" ERROR: Claude Agent SDK returned is_error=True: {usage}", file=sys.stderr)
+ return 1
+ if not response_text.strip():
+ print(" ERROR: Empty response from Claude Agent SDK", file=sys.stderr)
+ return 1Also applies to: 297-305
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/claude-slide/generate_slide.py` around lines 217 - 223, The code
extracts the is_error field from the ResultMessage object in the elif
isinstance(message, ResultMessage) block but does not evaluate this error status
before proceeding with the save operation. Add a check immediately after
extracting the usage dictionary to verify that is_error is False, and if it is
True, raise an exception or return an error to prevent failure responses from
being treated as successes. Apply this same error checking logic to the other
location mentioned (around lines 297-305) where similar ResultMessage handling
occurs.
| def find_source_html(manual_dir: str) -> str: | ||
| """元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。""" | ||
| for f in os.listdir(manual_dir): | ||
| if ( | ||
| f.endswith(".html") | ||
| and not f.startswith("slide") | ||
| and not f.startswith("verify") | ||
| ): | ||
| return os.path.join(manual_dir, f) | ||
| raise FileNotFoundError(f"No source HTML found in {manual_dir}") |
There was a problem hiding this comment.
検証元HTMLの選択ロジックが非決定的です
候補が複数ある場合に先頭一致を返しており、実行環境によって別ファイルを検証する可能性があります。検証結果の信頼性に直結するため、選択を決定的にしてください。
💡 修正案
def find_source_html(manual_dir: str) -> str:
"""元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。"""
- for f in os.listdir(manual_dir):
- if (
- f.endswith(".html")
- and not f.startswith("slide")
- and not f.startswith("verify")
- ):
- return os.path.join(manual_dir, f)
- raise FileNotFoundError(f"No source HTML found in {manual_dir}")
+ candidates = sorted(
+ f for f in os.listdir(manual_dir)
+ if f.endswith(".html")
+ and not f.startswith("slide")
+ and not f.startswith("verify")
+ )
+ if not candidates:
+ raise FileNotFoundError(f"No source HTML found in {manual_dir}")
+ if "source.html" in candidates:
+ return os.path.join(manual_dir, "source.html")
+ if len(candidates) == 1:
+ return os.path.join(manual_dir, candidates[0])
+ raise RuntimeError(
+ f"Ambiguous source HTML in {manual_dir}: {candidates}. "
+ "Keep exactly one source HTML (or name it source.html)."
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def find_source_html(manual_dir: str) -> str: | |
| """元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。""" | |
| for f in os.listdir(manual_dir): | |
| if ( | |
| f.endswith(".html") | |
| and not f.startswith("slide") | |
| and not f.startswith("verify") | |
| ): | |
| return os.path.join(manual_dir, f) | |
| raise FileNotFoundError(f"No source HTML found in {manual_dir}") | |
| def find_source_html(manual_dir: str) -> str: | |
| """元 HTML(slide_* / verify_* 以外の .html)を 1 つ返す。""" | |
| candidates = sorted( | |
| f for f in os.listdir(manual_dir) | |
| if f.endswith(".html") | |
| and not f.startswith("slide") | |
| and not f.startswith("verify") | |
| ) | |
| if not candidates: | |
| raise FileNotFoundError(f"No source HTML found in {manual_dir}") | |
| if "source.html" in candidates: | |
| return os.path.join(manual_dir, "source.html") | |
| if len(candidates) == 1: | |
| return os.path.join(manual_dir, candidates[0]) | |
| raise RuntimeError( | |
| f"Ambiguous source HTML in {manual_dir}: {candidates}. " | |
| "Keep exactly one source HTML (or name it source.html)." | |
| ) |
🧰 Tools
🪛 Ruff (0.15.17)
[warning] 73-73: Docstring contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF002)
[warning] 73-73: Docstring contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF002)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/claude-slide/verify_slide_mechanical.py` around lines 72 - 81, The
find_source_html function iterates through os.listdir(manual_dir) without
sorting, which returns files in non-deterministic order depending on the
filesystem and operating system. This causes different source HTML files to be
selected in different execution environments, affecting verification
reliability. Sort the result of os.listdir(manual_dir) before iterating through
it to ensure a consistent, deterministic file selection order regardless of the
execution environment.
| def find_source_html(manual_dir: str) -> str: | ||
| """元 HTML (slide_* / verify_* 以外の .html) を 1 つ返す。""" | ||
| for f in os.listdir(manual_dir): | ||
| if ( | ||
| f.endswith(".html") | ||
| and not f.startswith("slide") | ||
| and not f.startswith("verify") | ||
| ): | ||
| return os.path.join(manual_dir, f) | ||
| raise FileNotFoundError(f"No source HTML found in {manual_dir}") |
There was a problem hiding this comment.
入力HTMLの選択が非決定で、誤ファイル変換の原因になります。
os.listdir() の順序に依存して最初の .html を返しているため、候補が複数ある環境では実行ごとに別ファイルを拾う可能性があります。決定的変換の要件と衝突します。
💡 修正案
def find_source_html(manual_dir: str) -> str:
"""元 HTML (slide_* / verify_* 以外の .html) を 1 つ返す。"""
- for f in os.listdir(manual_dir):
- if (
- f.endswith(".html")
- and not f.startswith("slide")
- and not f.startswith("verify")
- ):
- return os.path.join(manual_dir, f)
- raise FileNotFoundError(f"No source HTML found in {manual_dir}")
+ candidates = sorted(
+ f for f in os.listdir(manual_dir)
+ if f.endswith(".html")
+ and not f.startswith("slide")
+ and not f.startswith("verify")
+ )
+ if len(candidates) != 1:
+ raise FileNotFoundError(
+ f"Expected exactly one source HTML in {manual_dir}, found: {candidates}"
+ )
+ return os.path.join(manual_dir, candidates[0])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/deterministic-slide/convert.py` around lines 42 - 51, The
find_source_html function relies on the arbitrary ordering of os.listdir() which
is non-deterministic across different environments and runs, causing it to
potentially return different HTML files on successive calls. Sort the list of
files returned by os.listdir(manual_dir) before iterating through them to ensure
consistent and deterministic file selection. Apply the sorting operation to the
directory listing so that the function always returns the same file given the
same input directory.
| def autolink_phone(text: str) -> str: | ||
| """電話番号らしき文字列を <a href="tel:..."> でラップする。 | ||
| HTML 化済みのテキストを受け取る前提なので、既にタグの中にある番号は触らない (簡易判定)。""" | ||
| # 「" 又は >」の直後にある番号だけ対象にする雑な区切り。電話番号がプレーンに段落中で | ||
| # 出現する場合をターゲットにし、既に href="tel:..." の中にあるものはスキップする。 | ||
| def repl(m: re.Match) -> str: | ||
| num = m.group(1) | ||
| return f'<a href="tel:{num}">{num}</a>' | ||
| return PHONE_RE.sub(repl, text) |
There was a problem hiding this comment.
電話番号自動リンク化が既存タグ内部を破壊します。
Line 212 で HTML 化済み文字列に PHONE_RE.sub(...) をかけているため、href 属性値や既存 <a> 内の番号まで再ラップされ、壊れた HTML(ネストした <a> など)を生成します。
根本対応として、HTML 文字列ではなく「テキストノード」に限定してリンク化してください。
Also applies to: 208-213
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/deterministic-slide/convert.py` around lines 107 - 115, The
autolink_phone function currently applies regex substitution to the entire HTML
string, which causes phone numbers already inside HTML attributes (like href
values) or within existing anchor tags to be re-wrapped, creating invalid nested
HTML. Modify the autolink_phone function to parse the HTML structure and apply
the PHONE_RE.sub pattern only to text nodes (plain text content), not to
attribute values or content already within HTML tags. This ensures phone numbers
are only linked when they appear as plain text in the document, not when they
are already part of HTML markup.
| elif t == "RawInline": | ||
| # ["html", "<raw>"] みたいなやつ。フォーマット指定が html なら通す | ||
| fmt, raw = c | ||
| if fmt == "html": | ||
| parts.append(raw) | ||
| elif t == "Note": |
There was a problem hiding this comment.
RawInline の生HTMLを無条件通過しており、公開先でXSSが成立します。
fmt == "html" をそのまま parts.append(raw) しているため、入力HTMLに悪意あるタグ/イベント属性が混入すると最終成果物で実行されます。許可タグのみ通す allowlist か、デフォルト escape に切り替えるべきです。
🔒 修正案(allowlist 例)
elif t == "RawInline":
# ["html", "<raw>"] みたいなやつ。フォーマット指定が html なら通す
fmt, raw = c
if fmt == "html":
- parts.append(raw)
+ safe = raw.strip()
+ if re.fullmatch(r"</?(br|sub|sup)\s*/?>", safe, flags=re.IGNORECASE):
+ parts.append(safe)
+ else:
+ parts.append(html.escape(raw))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/deterministic-slide/convert.py` around lines 168 - 173, The RawInline
handler unconditionally appends raw HTML content to the parts list when fmt
equals "html", which creates an XSS vulnerability as malicious tags and event
attributes can execute in the output. Either implement an allowlist approach
that only permits specific safe HTML tags (filtering the raw content
accordingly), or change the default behavior to escape the HTML content before
appending to parts instead of passing it through unmodified. The fix should be
applied in the RawInline case block where the fmt and raw values are extracted
from c.
概要
Issue #281「マニュアル生成のための変換、比較コードの実装」。Google Doc から書き出したマニュアルを、
スマホ最適化の自己完結 HTML(カード形式)に変換し、元の文章が保たれているかを検証する一連のコードと、
実運用手順を揃える。
このPRでの主な変更(最新コミット)
replace_placeholders堅牢化+--embed-only)verify_report.card-strict.mdを追加(構造ノイズを畳み、確認すべき本文差だけを日本語で提示)instructions.mdでプロンプト注入する再生成ループ対応publish.pymanual-slide-operations.md加えて、これ以前のコミットで生成エンジン・機械検証・決定的変換・プロンプト3種・automation スタブを実装済み。
運用モデル(develop に何を載せるか)
verify_*)・instructions.md。これらはブランチ依存の生成物で、かつ個人情報(スタッフ氏名・私用携帯番号)を含むため、
.git/info/exclude(ローカル限定)で除外する運用に統一した。生成HTMLは別リポジトリ(private)へ配信。docs/manuals/*/*.htmlは既に除外)にverify_*.txt|mdとinstructions.mdを追加し、規約から外れて追跡されていた verify テキストの追加コミットは履歴から外した。
検証
--embed-onlyで復旧 → MB級に戻り、未埋め込み<img>は 0 件。manuals/NN/index.html配置・URL出力を end-to-end 確認。Closes #281
🤖 Generated with Claude Code
https://claude.ai/code/session_01HtcZLCKZk7zMvqiDAntY2y
Summary by CodeRabbit
ドキュメント
新機能