From eff27affb01af4a0485aa7b471cc9bbbfe299273 Mon Sep 17 00:00:00 2001 From: Marlin Forbes Date: Sat, 2 May 2026 07:24:25 +0200 Subject: [PATCH 1/3] feat(harness): add memoize sub-action for memory hygiene --- skills/harness/SKILL.md | 48 ++- skills/harness/scripts/memoize-prompt.md | 90 ++++++ skills/harness/scripts/memoize.sh | 386 +++++++++++++++++++++++ 3 files changed, 519 insertions(+), 5 deletions(-) create mode 100644 skills/harness/scripts/memoize-prompt.md create mode 100755 skills/harness/scripts/memoize.sh diff --git a/skills/harness/SKILL.md b/skills/harness/SKILL.md index 3a739b1..6c0eab5 100644 --- a/skills/harness/SKILL.md +++ b/skills/harness/SKILL.md @@ -14,11 +14,15 @@ description: > of ~/.claude/ to a private git repo); status (report what's installed, modified, or missing); audit (prepare a monthly remote-audit routine that PRs deltas against the latest Anthropic releases and Claude Code - community patterns). All sub-actions are idempotent. Use when asked to - "set up my Claude Code", "install harness", "uninstall harness", "update - harness", "diagnose my setup", "adopt harness in this project", - "retrofit", "snapshot my setup", "audit my setup", "harden my Claude", - or any request matching the sub-actions. + community patterns); memoize (deterministic memory hygiene pass — + index sync, frontmatter, stale citations, lexical duplicates — emits + a stable report; pairs with a weekly /schedule routine). All + sub-actions are idempotent. Use when asked to "set up my Claude + Code", "install harness", "uninstall harness", "update harness", + "diagnose my setup", "adopt harness in this project", "retrofit", + "snapshot my setup", "audit my setup", "harden my Claude", + "memoize", "consolidate memory", "prune memory", or any request + matching the sub-actions. user-invocable: true --- @@ -42,6 +46,7 @@ The user invokes this skill, optionally with an action word. Detect intent and r | "snapshot", "backup", "mirror to git" | `snapshot` | `scripts/snapshot.sh` | | "status", "what's installed", "audit local" | `status` | `scripts/status.sh` | | "audit", "schedule audit", "monthly check" | `audit` | (prep work — see below) | +| "memoize", "consolidate memory", "prune memory" | `memoize` | `scripts/memoize.sh` | Substitute the skill's absolute base directory for `$SKILL_DIR` in every command — it's announced at the top of this invocation. @@ -239,6 +244,37 @@ Steps: The remote agent clones the snapshot repo, researches the last ~30 days of Anthropic releases and canonical Claude Code voices, and PRs `audits/YYYY-MM-DD-setup-audit.md` with prioritised deltas. It never modifies tracked files outside `audits/`. +## memoize + +```bash +bash "$SKILL_DIR/scripts/memoize.sh" +``` + +Proactive memory hygiene for `~/.claude/projects//memory/`. Memory is reactive — entries get written when the agent notices something worth saving, but nothing prunes or consolidates. `memoize` is the deterministic maintenance pass. + +What it checks: + +1. **Index sync** — every `memory/*.md` is listed in `MEMORY.md`; every `MEMORY.md` entry points at a file that exists. +2. **Frontmatter hygiene** — every memory has the required `name`, `description`, `type`. +3. **Stale citations** — path-shaped tokens (anything starting with `~/`, `/Users/`, `./`, etc., or ending in a known source extension) that resolve nowhere across `~/.claude/projects/` and `~/Projects/`. Conservative on purpose — false positives cost more than misses. +4. **Possible duplicates** — pairs of memories of the same `type` whose `name` or `description` are lexically similar (Jaccard ≥ 0.5). Flag, don't merge. + +Output: `/_memoize-report.md` (leading underscore so it's never indexed by `MEMORY.md`). The report is byte-stable on equal runs — running twice in a row produces an identical file. + +Flags: +- `--dry-run` — print the plan + report preview, write nothing. +- `--target=PATH` — explicit memory dir. + +Env knobs (mirror `snapshot.sh`): `CLAUDE_DIR`, `USER_PROJECT_KEY`, `MEMOIZE_SEARCH_ROOTS`. + +**Scheduled routine.** For deeper passes (conceptual duplicates, outdated facts, conflicts the lexical script can't see), wire it as a weekly `/schedule` job using `scripts/memoize-prompt.md`. Suggested config: +- cron: `0 6 * * 0` (Sunday 06:00 UTC) +- model: `claude-opus-4-7` +- tools: Bash, Read, Write, Edit, Glob, Grep, Agent +- source: the user's snapshot repo (run `harness snapshot` first) + +The remote agent runs `memoize.sh`, layers in semantic checks, and PRs `audits/memory/YYYY-MM-DD.md` with proposed edits. It never modifies memory directly — the user reviews and applies. + ## Constraints - **Never auto-fill stack signals or user_role.** Templates have placeholders; ask the user to fill them. @@ -267,3 +303,5 @@ The remote agent clones the snapshot repo, researches the last ~30 days of Anthr | `scripts/_detect_stack.py` | Stack-signal detector — auto-fills `## Stack signals` at install time | | `assets/harness-check.sh.tmpl` | Starter project pass/fail gate written by `adopt` | | `scripts/audit-prompt.md` | Prompt template for the monthly remote-audit routine | +| `scripts/memoize.sh` | Memory consolidation pass — deterministic; emits `_memoize-report.md` | +| `scripts/memoize-prompt.md` | Prompt template for the weekly remote-memoize routine | diff --git a/skills/harness/scripts/memoize-prompt.md b/skills/harness/scripts/memoize-prompt.md new file mode 100644 index 0000000..f41fe8f --- /dev/null +++ b/skills/harness/scripts/memoize-prompt.md @@ -0,0 +1,90 @@ +# Weekly memory consolidation — agent prompt + +Use this prompt as the `events[].data.message.content` when creating a remote +routine that consolidates `~/.claude/projects//memory/`. The routine +clones the snapshot repo (e.g. `/claude-setup`) and needs Bash, Read, +Write, Edit, Glob, Grep, Agent. + +Suggested config: +- cron: `0 6 * * 0` (Sunday 06:00 UTC) +- model: `claude-opus-4-7` +- tools: Bash, Read, Write, Edit, Glob, Grep, Agent +- sources: the user's snapshot repo (must exist — run `harness snapshot` first) + +--- + +You are doing the weekly memory hygiene pass against a snapshot of my Claude +Code memory. The repo you are running in is a sanitised mirror of `~/.claude/` +(see README.md). Memory lives in `memory/`, indexed by `memory/MEMORY.md`. + +## Your task + +1. **Run the deterministic pass.** If `scripts/memoize.sh` exists in the + snapshot, run it with `--target=memory`. Otherwise, replicate its checks + in-process: index sync (every `memory/*.md` listed in `MEMORY.md`, every + entry points at a real file), frontmatter hygiene (every memory has + `name`, `description`, `type`), stale citations (path-shaped tokens that + resolve nowhere), possible duplicates (same `type`, lexically similar + name/description). + +2. **Read every memory.** For each file under `memory/`, read it and form a + one-line gist. Group by `type`. + +3. **Look for the things the script can't catch:** + - **Conceptual duplicates** that don't share vocabulary — two `feedback` + memories saying the same thing in different words. + - **Outdated facts** — `project` memories citing deadlines, milestones, + stakeholders that have moved on. Check the snapshot's `CLAUDE.md` and + other context for currency. + - **Conflicting guidance** — two memories that pull in opposite directions. + - **Index drift** — entries in `MEMORY.md` whose one-line hook no longer + reflects the file's body. + +4. **Write the report** to `audits/memory/YYYY-MM-DD.md`. Structure: + + ```markdown + # Memory consolidation — {{date}} + + ## TL;DR + 3-5 bullets — the highest-leverage merges, deletes, or rewrites. + + ## Deterministic findings + Output of memoize.sh (or the equivalent in-process pass). Verbatim. + + ## Conceptual duplicates + Pairs of memories that say the same thing differently. Propose a merge + target and the surviving content. + + ## Outdated facts + Memories whose body cites stale state. Quote the line, suggest the edit. + + ## Conflicts + Memories pulling in opposite directions. Surface the contradiction; do + not resolve it unilaterally. + + ## Proposed edits (ordered by leverage) + - **What:** one-line summary + - **Where:** `memory/.md` + - **Diff:** old → new (or "merge X into Y, delete X") + - **Why:** the signal in the body that motivates this + + ## Skip-list + Things that looked relevant but aren't worth doing this week. + ``` + +5. **Open a PR.** Branch `memoize/YYYY-MM-DD`, commit + `memoize: weekly memory consolidation YYYY-MM-DD`, PR title + `Memory consolidation — {{date}}`, PR body = TL;DR. Use `gh pr create`. + +## Constraints + +- DO NOT modify any tracked file outside `audits/memory/`. The user reviews + proposed edits and applies them locally — the routine never edits memory. +- Conservative on stale citations and outdated facts. False positives cost + more than misses; the user reads every line of this report. +- Prefer specific over comprehensive. 3 merges I'll do > 30 I won't. +- If nothing material changed since last week, write a short report saying + so. Don't pad. +- Concise output — the user reads in feedforward / sensors / GC vocabulary. + +Report length target: 400-1000 words. diff --git a/skills/harness/scripts/memoize.sh b/skills/harness/scripts/memoize.sh new file mode 100755 index 0000000..d39b895 --- /dev/null +++ b/skills/harness/scripts/memoize.sh @@ -0,0 +1,386 @@ +#!/usr/bin/env bash +# Memoize — proactive memory hygiene for ~/.claude/projects//memory/. +# Read-only by default in spirit: emits a markdown report; never edits or +# deletes a memory file. The report goes to /_memoize-report.md. +# +# Checks: +# 1. Index sync — every memory/*.md (except _*.md) is in MEMORY.md; +# every MEMORY.md entry points at a real file. +# 2. Frontmatter — every memory has `name`, `description`, `type`. +# 3. Stale citations — path-shaped tokens in memory bodies that resolve +# nowhere across the search roots. +# 4. Duplicates — pairs of memories of the same `type` whose names +# or descriptions are lexically close. +# +# Output is sorted and stable: running twice produces an identical report +# (no timestamps in the body). +# +# Usage: +# bash memoize.sh # write the report +# bash memoize.sh --dry-run # print the plan, write nothing +# bash memoize.sh --target=PATH # explicit memory dir +# +# Env knobs (mirror snapshot.sh): +# CLAUDE_DIR=/path/to/.claude +# USER_PROJECT_KEY=-Users-foo +# MEMOIZE_SEARCH_ROOTS="$HOME/.claude/projects $HOME/Projects" + +set -euo pipefail + +DRY_RUN=0 +TARGET="" +for arg in "$@"; do + case "$arg" in + --dry-run) DRY_RUN=1 ;; + --target=*) TARGET="${arg#--target=}" ;; + -h|--help) + sed -n '2,/^$/p' "$0" | sed 's/^# \{0,1\}//' + exit 0 + ;; + *) echo "unknown flag: $arg" >&2; exit 2 ;; + esac +done + +CLAUDE_DIR="${CLAUDE_DIR:-$HOME/.claude}" +USER_PROJECT_KEY="${USER_PROJECT_KEY:-$(printf '%s' "$HOME" | tr '/' '-')}" +if [ -z "$TARGET" ]; then + TARGET="$CLAUDE_DIR/projects/$USER_PROJECT_KEY/memory" +fi +SEARCH_ROOTS="${MEMOIZE_SEARCH_ROOTS:-$HOME/.claude/projects $HOME/Projects}" + +if [ ! -d "$TARGET" ]; then + echo "error: memory dir not found: $TARGET" >&2 + echo " set --target=PATH or USER_PROJECT_KEY" >&2 + exit 2 +fi + +REPORT="$TARGET/_memoize-report.md" + +green() { printf '\033[32m%s\033[0m' "$1"; } +yellow() { printf '\033[33m%s\033[0m' "$1"; } +dim() { printf '\033[90m%s\033[0m' "$1"; } +if ! [ -t 1 ]; then + green() { printf '%s' "$1"; } + yellow() { printf '%s' "$1"; } + dim() { printf '%s' "$1"; } +fi + +echo "memoize — target=$TARGET" +[ "$DRY_RUN" -eq 1 ] && echo "$(yellow '(dry-run)') no file will be written" +echo + +# Resolve PYBIN once. +if command -v python3 >/dev/null 2>&1; then + PYBIN=python3 +elif command -v python >/dev/null 2>&1; then + PYBIN=python +else + echo "error: python3 not on PATH (needed for frontmatter parsing)" >&2 + exit 2 +fi + +# Hand off to python for the analysis. Keeps the bash thin and the parsing safe. +TMP="$(mktemp)" +trap 'rm -f "$TMP"' EXIT + +TARGET="$TARGET" SEARCH_ROOTS="$SEARCH_ROOTS" "$PYBIN" - <<'PY' > "$TMP" +import os, re, sys, hashlib +from pathlib import Path + +target = Path(os.environ["TARGET"]) +roots = [Path(p) for p in os.environ["SEARCH_ROOTS"].split() if p] + +REPORT_NAME = "_memoize-report.md" +INDEX = "MEMORY.md" +REQUIRED = ("name", "description", "type") + +def memory_files(): + out = [] + for p in sorted(target.iterdir()): + if not p.is_file(): + continue + if p.suffix != ".md": + continue + if p.name == INDEX or p.name.startswith("_"): + continue + out.append(p) + return out + +def parse_frontmatter(text): + if not text.startswith("---\n"): + return None, text + end = text.find("\n---\n", 4) + if end == -1: + return None, text + block = text[4:end] + body = text[end+5:] + fm = {} + for line in block.splitlines(): + if ":" not in line: + continue + k, _, v = line.partition(":") + fm[k.strip()] = v.strip() + return fm, body + +def index_entries(): + p = target / INDEX + if not p.exists(): + return [], None + entries = [] + pat = re.compile(r"\[([^\]]+)\]\(([^)]+)\)") + raw = p.read_text(encoding="utf-8") + for line in raw.splitlines(): + m = pat.search(line) + if m: + entries.append((m.group(1), m.group(2))) + return entries, raw + +def shorten(s, n=80): + s = s.strip().replace("\n", " ") + return s if len(s) <= n else s[:n-1] + "…" + +# ---- 1. Index sync ---- +files = memory_files() +file_names = {p.name for p in files} +entries, _ = index_entries() +indexed = {name for _, name in entries} + +missing_in_index = sorted(file_names - indexed) +broken_in_index = sorted({name for _, name in entries if name not in file_names}) + +# ---- 2. Frontmatter ---- +fm_issues = [] # (filename, problem) +parsed = {} # filename -> (fm, body) +for p in files: + text = p.read_text(encoding="utf-8") + fm, body = parse_frontmatter(text) + parsed[p.name] = (fm, body) + if fm is None: + fm_issues.append((p.name, "no frontmatter block")) + continue + for k in REQUIRED: + if not fm.get(k): + fm_issues.append((p.name, f"missing or empty `{k}`")) +fm_issues.sort() + +# ---- 3. Stale citations ---- +# Conservative — flag a token only if it's *unambiguously* a real filesystem +# path. Two ways to qualify: +# (a) starts with a hard prefix that's almost always a path: ~/, /Users/, +# /home/, /etc/, /opt/, /var/, /tmp/, ./, ../ +# (b) ends in a recognised source-file extension. +# This drops slash-commands (/verify, /grade), brace expansions +# ({a,b,c}), regex-ish strings, and similar false positives. +PATH_PREFIXES = ("~/", "/Users/", "/home/", "/etc/", "/opt/", "/var/", "/tmp/", + "./", "../") +PATH_EXTS = (".md", ".sh", ".py", ".ts", ".tsx", ".js", ".jsx", ".json", + ".yml", ".yaml", ".toml", ".php", ".go", ".rb", ".rs", + ".html", ".css", ".sql", ".lock", ".env") +path_pat = re.compile(r"`(?P[^`\n]+)`|(?P(?:~|\.{1,2})?/[^\s`)\]\"',]+)") +known_existing_cache = {} + +def looks_like_path(tok): + if not tok or "/" not in tok: + return False + if tok.startswith(("http://", "https://", "//")): + return False + if tok.startswith("<") or tok.endswith(">"): + return False + if "{" in tok or "}" in tok or "*" in tok: + return False + if tok.startswith(PATH_PREFIXES): + return True + if any(tok.endswith(ext) for ext in PATH_EXTS): + return True + return False + +def candidate_paths(text): + out = [] + for m in path_pat.finditer(text): + tok = (m.group("bt") or m.group("bare") or "").strip() + tok = tok.rstrip(".,;:)") + if looks_like_path(tok): + out.append(tok) + return out + +def resolves(tok): + if tok in known_existing_cache: + return known_existing_cache[tok] + # Expand ~/ + if tok.startswith("~/"): + candidates = [Path(os.path.expanduser(tok))] + elif tok.startswith("/"): + candidates = [Path(tok)] + elif tok.startswith("./"): + candidates = [Path(tok)] + else: + # Treat as a relative project-ish path: try under each search root. + candidates = [r / tok for r in roots] + # also try as a basename match under any root (one level only) + # (skipped — too broad; conservatism wins) + found = any(c.exists() for c in candidates) + if not found: + # last-ditch: a basename-only suffix match under search roots, depth 4 + base = tok.rstrip("/").split("/")[-1] + if base and len(base) > 2: + for r in roots: + if not r.exists(): + continue + # single-shot find: walk up to depth 4 + hit = False + for dirpath, dirnames, filenames in os.walk(r): + depth = len(Path(dirpath).relative_to(r).parts) + if depth > 4: + dirnames[:] = [] + continue + if base in filenames or base in dirnames: + hit = True + break + if hit: + found = True + break + known_existing_cache[tok] = found + return found + +stale = [] # (filename, token) +for p in files: + fm, body = parsed[p.name] + seen = set() + for tok in candidate_paths(body or ""): + if tok in seen: + continue + seen.add(tok) + if not resolves(tok): + stale.append((p.name, tok)) +stale.sort() + +# ---- 4. Duplicate detection (lexical, by type) ---- +def normalize(s): + s = (s or "").lower() + s = re.sub(r"[^a-z0-9]+", " ", s) + return set(t for t in s.split() if len(t) > 2) + +def jaccard(a, b): + if not a or not b: + return 0.0 + return len(a & b) / len(a | b) + +by_type = {} +for p in files: + fm, _ = parsed[p.name] + t = (fm or {}).get("type", "") + by_type.setdefault(t, []).append((p.name, fm or {})) + +dupes = [] # (type, file_a, file_b, score) +for t, items in by_type.items(): + items.sort() + for i in range(len(items)): + for j in range(i+1, len(items)): + a_name, a_fm = items[i] + b_name, b_fm = items[j] + score = max( + jaccard(normalize(a_fm.get("name")), normalize(b_fm.get("name"))), + jaccard(normalize(a_fm.get("description")), normalize(b_fm.get("description"))), + ) + if score >= 0.5: + dupes.append((t, a_name, b_name, round(score, 2))) +dupes.sort() + +# ---- Render ---- +out = [] +def section(title): + out.append(f"## {title}\n") + +out.append("# Memory consolidation report\n") +out.append("Generated by `harness memoize`. Report-only — no memory files were modified.\n") +out.append(f"Memory dir: `{target}`\n") +out.append(f"Files scanned: {len(files)}\n") +out.append("") + +section("Index sync") +if not missing_in_index and not broken_in_index: + out.append("OK — every memory file is indexed and every index entry resolves.\n") +else: + if missing_in_index: + out.append("**Files not listed in MEMORY.md:**\n") + for n in missing_in_index: + out.append(f"- `{n}`") + out.append("") + if broken_in_index: + out.append("**MEMORY.md entries pointing at missing files:**\n") + for n in broken_in_index: + out.append(f"- `{n}`") + out.append("") +out.append("") + +section("Frontmatter hygiene") +if not fm_issues: + out.append("OK — every memory has `name`, `description`, and `type`.\n") +else: + for name, problem in fm_issues: + out.append(f"- `{name}` — {problem}") + out.append("") +out.append("") + +section("Stale citations") +if not stale: + out.append("OK — no path-shaped tokens that fail to resolve under the search roots.\n") +else: + out.append(f"Search roots: `{os.environ['SEARCH_ROOTS']}`\n") + out.append("Conservative — only flags tokens with an explicit `~/`, `/`, or `./` prefix or backtick-wrapped path. False positives possible; verify before acting.\n") + cur = None + for name, tok in stale: + if name != cur: + out.append(f"- `{name}`") + cur = name + out.append(f" - `{tok}`") + out.append("") +out.append("") + +section("Possible duplicates") +if not dupes: + out.append("OK — no near-duplicate name/description pairs within a `type`.\n") +else: + out.append("Lexical Jaccard ≥ 0.5 on `name` or `description`. Review and decide whether to merge.\n") + for t, a, b, score in dupes: + out.append(f"- _{t}_ — `{a}` ↔ `{b}` (score {score})") + out.append("") +out.append("") + +# Final summary line +total_findings = len(missing_in_index) + len(broken_in_index) + len(fm_issues) + len(stale) + len(dupes) +out.append("---") +out.append(f"Findings: {total_findings} " + f"(index {len(missing_in_index)+len(broken_in_index)}, " + f"frontmatter {len(fm_issues)}, " + f"stale {len(stale)}, " + f"duplicates {len(dupes)})") + +text = "\n".join(out).rstrip() + "\n" +sys.stdout.write(text) + +# Also emit a one-line summary to stderr for the bash wrapper. +sys.stderr.write(f"SUMMARY findings={total_findings}\n") +PY + +# Read the python summary off stderr — but we already piped to TMP only, so +# re-run is wasteful. Cheaper: tail the report's last line for the summary. +SUMMARY="$(tail -n 1 "$TMP")" + +if [ "$DRY_RUN" -eq 1 ]; then + dim 'plan:'; echo + echo " would write: $REPORT" + echo + dim '── report preview ──'; echo + cat "$TMP" + dim '── end ──'; echo + exit 0 +fi + +# Atomic write so the report is byte-stable on equal runs. +mkdir -p "$TARGET" +mv "$TMP" "$REPORT" +trap - EXIT + +echo "$(green 'wrote:') $REPORT" +echo " $SUMMARY" From de2d1e7dcf4d0ed43b28567f875142beb7969f91 Mon Sep 17 00:00:00 2001 From: Marlin Forbes Date: Sat, 2 May 2026 07:39:32 +0200 Subject: [PATCH 2/3] fix(harness): address memoize PR review - CLAUDE_DIR override now cascades to default search roots - MEMOIZE_SEARCH_ROOTS uses PATH-style colon separator (paths with spaces) - ./ and ../ tokens resolve under search roots, not CWD - prune .git, node_modules, vendor, etc. during fallback walk - drop unread SUMMARY stderr emission - prompt drops the local-script branch (snapshot has no scripts) - prompt index-check explicitly skips MEMORY.md and _*.md - prompt clarifies stale-citation is local-only - SKILL.md clarifies report-vs-memory contract --- .mcp.json | 8 ++++ skills/harness/SKILL.md | 11 +++-- skills/harness/scripts/memoize-prompt.md | 57 ++++++++++++++++-------- skills/harness/scripts/memoize.sh | 41 +++++++++-------- 4 files changed, 75 insertions(+), 42 deletions(-) create mode 100644 .mcp.json diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 0000000..b05a925 --- /dev/null +++ b/.mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "atlassian": { + "type": "http", + "url": "https://mcp.atlassian.com/v1/mcp" + } + } +} \ No newline at end of file diff --git a/skills/harness/SKILL.md b/skills/harness/SKILL.md index 6c0eab5..5a8f337 100644 --- a/skills/harness/SKILL.md +++ b/skills/harness/SKILL.md @@ -259,21 +259,24 @@ What it checks: 3. **Stale citations** — path-shaped tokens (anything starting with `~/`, `/Users/`, `./`, etc., or ending in a known source extension) that resolve nowhere across `~/.claude/projects/` and `~/Projects/`. Conservative on purpose — false positives cost more than misses. 4. **Possible duplicates** — pairs of memories of the same `type` whose `name` or `description` are lexically similar (Jaccard ≥ 0.5). Flag, don't merge. -Output: `/_memoize-report.md` (leading underscore so it's never indexed by `MEMORY.md`). The report is byte-stable on equal runs — running twice in a row produces an identical file. +Output: a single file at `/_memoize-report.md`. The leading underscore is the contract — `MEMORY.md` indexing rules and the remote routine both ignore `_*.md`, so the report itself never gets treated as a memory entry. The report is byte-stable on equal runs (two consecutive invocations produce an identical file). Flags: - `--dry-run` — print the plan + report preview, write nothing. - `--target=PATH` — explicit memory dir. -Env knobs (mirror `snapshot.sh`): `CLAUDE_DIR`, `USER_PROJECT_KEY`, `MEMOIZE_SEARCH_ROOTS`. +Env knobs (mirror `snapshot.sh`): +- `CLAUDE_DIR` — root of the Claude Code config dir. Search-root defaults track this, so a custom `CLAUDE_DIR` cascades correctly. +- `USER_PROJECT_KEY` — the slug under `/projects/`. +- `MEMOIZE_SEARCH_ROOTS` — colon-separated (PATH-style, supports paths with spaces) list of roots to resolve stale citations against. Defaults to `/projects:$HOME/Projects`. -**Scheduled routine.** For deeper passes (conceptual duplicates, outdated facts, conflicts the lexical script can't see), wire it as a weekly `/schedule` job using `scripts/memoize-prompt.md`. Suggested config: +**Scheduled routine.** For the conceptual drift the lexical script can't see (semantic duplicates, outdated facts, conflicting guidance), wire a weekly `/schedule` job using `scripts/memoize-prompt.md`. Suggested config: - cron: `0 6 * * 0` (Sunday 06:00 UTC) - model: `claude-opus-4-7` - tools: Bash, Read, Write, Edit, Glob, Grep, Agent - source: the user's snapshot repo (run `harness snapshot` first) -The remote agent runs `memoize.sh`, layers in semantic checks, and PRs `audits/memory/YYYY-MM-DD.md` with proposed edits. It never modifies memory directly — the user reviews and applies. +Scope: the snapshot repo doesn't mirror harness scripts or local search roots, so the remote agent does its own in-process structural pass (index sync, frontmatter) and adds the semantic checks. Stale-citation analysis stays local-only — the search roots aren't available remotely. The remote agent PRs `audits/memory/YYYY-MM-DD.md` with proposed edits and never modifies any memory entry. ## Constraints diff --git a/skills/harness/scripts/memoize-prompt.md b/skills/harness/scripts/memoize-prompt.md index f41fe8f..1e1cc6a 100644 --- a/skills/harness/scripts/memoize-prompt.md +++ b/skills/harness/scripts/memoize-prompt.md @@ -11,34 +11,49 @@ Suggested config: - tools: Bash, Read, Write, Edit, Glob, Grep, Agent - sources: the user's snapshot repo (must exist — run `harness snapshot` first) +## Scope note + +The local `memoize.sh` script does four checks: index sync, frontmatter +hygiene, stale citations, and lexical duplicates. Of these, **stale +citations cannot run remotely** — the search roots (`~/.claude/projects`, +`~/Projects`) are local-machine state that the snapshot repo does not +mirror. The remote routine focuses on the **conceptual** drift the local +script can't see (semantic duplicates, outdated facts, conflicts), and +re-runs the cheap structural checks (index, frontmatter) directly against +the snapshot. + --- You are doing the weekly memory hygiene pass against a snapshot of my Claude Code memory. The repo you are running in is a sanitised mirror of `~/.claude/` (see README.md). Memory lives in `memory/`, indexed by `memory/MEMORY.md`. +The snapshot does **not** contain harness scripts or local search roots. ## Your task -1. **Run the deterministic pass.** If `scripts/memoize.sh` exists in the - snapshot, run it with `--target=memory`. Otherwise, replicate its checks - in-process: index sync (every `memory/*.md` listed in `MEMORY.md`, every - entry points at a real file), frontmatter hygiene (every memory has - `name`, `description`, `type`), stale citations (path-shaped tokens that - resolve nowhere), possible duplicates (same `type`, lexically similar - name/description). +1. **Structural checks (in-process).** For files under `memory/`, applying + `memory/MEMORY.md` as the index: + - **Index sync** — every `memory/*.md` whose filename does NOT start with + `_` and is NOT `MEMORY.md` itself must appear as an entry in + `MEMORY.md`. Every `MEMORY.md` entry must point at a real file. + - **Frontmatter hygiene** — every memory file must start with a `---` + YAML frontmatter block containing non-empty `name`, `description`, + and `type` fields. -2. **Read every memory.** For each file under `memory/`, read it and form a - one-line gist. Group by `type`. +2. **Read every memory.** For each `memory/*.md` file (skipping `MEMORY.md` + and any `_*.md` such as the consolidation report itself), read it and + form a one-line gist. Group by `type`. -3. **Look for the things the script can't catch:** +3. **Conceptual checks the structural pass can't catch:** - **Conceptual duplicates** that don't share vocabulary — two `feedback` memories saying the same thing in different words. - **Outdated facts** — `project` memories citing deadlines, milestones, - stakeholders that have moved on. Check the snapshot's `CLAUDE.md` and - other context for currency. - - **Conflicting guidance** — two memories that pull in opposite directions. - - **Index drift** — entries in `MEMORY.md` whose one-line hook no longer - reflects the file's body. + stakeholders that have moved on. Cross-check against the snapshot's + `CLAUDE.md` and other context for currency. + - **Conflicting guidance** — two memories that pull in opposite + directions. + - **Index drift** — entries in `MEMORY.md` whose one-line hook no + longer reflects the file's body. 4. **Write the report** to `audits/memory/YYYY-MM-DD.md`. Structure: @@ -48,8 +63,8 @@ Code memory. The repo you are running in is a sanitised mirror of `~/.claude/` ## TL;DR 3-5 bullets — the highest-leverage merges, deletes, or rewrites. - ## Deterministic findings - Output of memoize.sh (or the equivalent in-process pass). Verbatim. + ## Structural findings + Index sync + frontmatter hygiene issues. Per-file bullets. ## Conceptual duplicates Pairs of memories that say the same thing differently. Propose a merge @@ -80,8 +95,12 @@ Code memory. The repo you are running in is a sanitised mirror of `~/.claude/` - DO NOT modify any tracked file outside `audits/memory/`. The user reviews proposed edits and applies them locally — the routine never edits memory. -- Conservative on stale citations and outdated facts. False positives cost - more than misses; the user reads every line of this report. +- Skip files whose names start with `_` (e.g. `_memoize-report.md` if it + ever lands in the snapshot) and `MEMORY.md` when iterating "every memory". +- Stale-citation analysis is local-only; do not attempt a snapshot + equivalent — false positives drown the signal. +- Conservative on outdated facts. False positives cost more than misses; + the user reads every line of this report. - Prefer specific over comprehensive. 3 merges I'll do > 30 I won't. - If nothing material changed since last week, write a short report saying so. Don't pad. diff --git a/skills/harness/scripts/memoize.sh b/skills/harness/scripts/memoize.sh index d39b895..7b4d5ee 100755 --- a/skills/harness/scripts/memoize.sh +++ b/skills/harness/scripts/memoize.sh @@ -23,7 +23,9 @@ # Env knobs (mirror snapshot.sh): # CLAUDE_DIR=/path/to/.claude # USER_PROJECT_KEY=-Users-foo -# MEMOIZE_SEARCH_ROOTS="$HOME/.claude/projects $HOME/Projects" +# MEMOIZE_SEARCH_ROOTS="$CLAUDE_DIR/projects:$HOME/Projects" +# Colon-separated (like PATH) so paths-with-spaces work. Defaults track +# $CLAUDE_DIR so an override cascades correctly. set -euo pipefail @@ -46,7 +48,7 @@ USER_PROJECT_KEY="${USER_PROJECT_KEY:-$(printf '%s' "$HOME" | tr '/' '-')}" if [ -z "$TARGET" ]; then TARGET="$CLAUDE_DIR/projects/$USER_PROJECT_KEY/memory" fi -SEARCH_ROOTS="${MEMOIZE_SEARCH_ROOTS:-$HOME/.claude/projects $HOME/Projects}" +SEARCH_ROOTS="${MEMOIZE_SEARCH_ROOTS:-$CLAUDE_DIR/projects:$HOME/Projects}" if [ ! -d "$TARGET" ]; then echo "error: memory dir not found: $TARGET" >&2 @@ -88,7 +90,9 @@ import os, re, sys, hashlib from pathlib import Path target = Path(os.environ["TARGET"]) -roots = [Path(p) for p in os.environ["SEARCH_ROOTS"].split() if p] +roots = [Path(p) for p in os.environ["SEARCH_ROOTS"].split(":") if p] +PRUNE_DIRS = {".git", "node_modules", "vendor", "__pycache__", ".venv", + "venv", "target", "dist", "build", ".next", ".cache"} REPORT_NAME = "_memoize-report.md" INDEX = "MEMORY.md" @@ -165,11 +169,13 @@ fm_issues.sort() # ---- 3. Stale citations ---- # Conservative — flag a token only if it's *unambiguously* a real filesystem -# path. Two ways to qualify: -# (a) starts with a hard prefix that's almost always a path: ~/, /Users/, +# path. Token must contain a `/` AND either: +# (a) start with a hard prefix that's almost always a path: ~/, /Users/, # /home/, /etc/, /opt/, /var/, /tmp/, ./, ../ -# (b) ends in a recognised source-file extension. -# This drops slash-commands (/verify, /grade), brace expansions +# (b) end in a recognised source-file extension. +# Bare basenames like `settings.json` are *intentionally* skipped — too many +# false positives (string literals, log lines, prose mentions). +# This also drops slash-commands (/verify, /grade), brace expansions # ({a,b,c}), regex-ish strings, and similar false positives. PATH_PREFIXES = ("~/", "/Users/", "/home/", "/etc/", "/opt/", "/var/", "/tmp/", "./", "../") @@ -211,24 +217,26 @@ def resolves(tok): candidates = [Path(os.path.expanduser(tok))] elif tok.startswith("/"): candidates = [Path(tok)] - elif tok.startswith("./"): - candidates = [Path(tok)] + elif tok.startswith(("./", "../")): + # Don't resolve against CWD — that makes results depend on where + # the script was invoked from. Resolve under each search root. + rel = tok.lstrip(".").lstrip("/") + candidates = [r / rel for r in roots] else: - # Treat as a relative project-ish path: try under each search root. + # Bare relative path (no leading marker): try under each search root. candidates = [r / tok for r in roots] - # also try as a basename match under any root (one level only) - # (skipped — too broad; conservatism wins) found = any(c.exists() for c in candidates) if not found: - # last-ditch: a basename-only suffix match under search roots, depth 4 + # last-ditch: a basename suffix match under search roots, depth 4, + # pruning heavy dirs. base = tok.rstrip("/").split("/")[-1] if base and len(base) > 2: for r in roots: if not r.exists(): continue - # single-shot find: walk up to depth 4 hit = False for dirpath, dirnames, filenames in os.walk(r): + dirnames[:] = [d for d in dirnames if d not in PRUNE_DIRS] depth = len(Path(dirpath).relative_to(r).parts) if depth > 4: dirnames[:] = [] @@ -358,13 +366,8 @@ out.append(f"Findings: {total_findings} " text = "\n".join(out).rstrip() + "\n" sys.stdout.write(text) - -# Also emit a one-line summary to stderr for the bash wrapper. -sys.stderr.write(f"SUMMARY findings={total_findings}\n") PY -# Read the python summary off stderr — but we already piped to TMP only, so -# re-run is wasteful. Cheaper: tail the report's last line for the summary. SUMMARY="$(tail -n 1 "$TMP")" if [ "$DRY_RUN" -eq 1 ]; then From 151e722001f3e31130c15930a790aa39f1c3ec11 Mon Sep 17 00:00:00 2001 From: Marlin Forbes Date: Sat, 2 May 2026 07:39:45 +0200 Subject: [PATCH 3/3] chore: drop stray .mcp.json from PR --- .mcp.json | 8 -------- 1 file changed, 8 deletions(-) delete mode 100644 .mcp.json diff --git a/.mcp.json b/.mcp.json deleted file mode 100644 index b05a925..0000000 --- a/.mcp.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "mcpServers": { - "atlassian": { - "type": "http", - "url": "https://mcp.atlassian.com/v1/mcp" - } - } -} \ No newline at end of file