From eff27affb01af4a0485aa7b471cc9bbbfe299273 Mon Sep 17 00:00:00 2001
From: Marlin Forbes <marlinf@datashaman.com>
Date: Sat, 2 May 2026 07:24:25 +0200
Subject: [PATCH 1/3] feat(harness): add memoize sub-action for memory hygiene

---
 skills/harness/SKILL.md                  |  48 ++-
 skills/harness/scripts/memoize-prompt.md |  90 ++++++
 skills/harness/scripts/memoize.sh        | 386 +++++++++++++++++++++++
 3 files changed, 519 insertions(+), 5 deletions(-)
 create mode 100644 skills/harness/scripts/memoize-prompt.md
 create mode 100755 skills/harness/scripts/memoize.sh
diff --git a/skills/harness/SKILL.md b/skills/harness/SKILL.md
index 3a739b1..6c0eab5 100644
--- a/skills/harness/SKILL.md
+++ b/skills/harness/SKILL.md
@@ -14,11 +14,15 @@ description: >
   of ~/.claude/ to a private git repo); status (report what's installed,
   modified, or missing); audit (prepare a monthly remote-audit routine
   that PRs deltas against the latest Anthropic releases and Claude Code
-  community patterns). All sub-actions are idempotent. Use when asked to
-  "set up my Claude Code", "install harness", "uninstall harness", "update
-  harness", "diagnose my setup", "adopt harness in this project",
-  "retrofit", "snapshot my setup", "audit my setup", "harden my Claude",
-  or any request matching the sub-actions.
+  community patterns); memoize (deterministic memory hygiene pass —
+  index sync, frontmatter, stale citations, lexical duplicates — emits
+  a stable report; pairs with a weekly /schedule routine). All
+  sub-actions are idempotent. Use when asked to "set up my Claude
+  Code", "install harness", "uninstall harness", "update harness",
+  "diagnose my setup", "adopt harness in this project", "retrofit",
+  "snapshot my setup", "audit my setup", "harden my Claude",
+  "memoize", "consolidate memory", "prune memory", or any request
+  matching the sub-actions.
 user-invocable: true
 ---
 
@@ -42,6 +46,7 @@ The user invokes this skill, optionally with an action word. Detect intent and r
 | "snapshot", "backup", "mirror to git"         | `snapshot`  | `scripts/snapshot.sh`           |
 | "status", "what's installed", "audit local"   | `status`    | `scripts/status.sh`             |
 | "audit", "schedule audit", "monthly check"    | `audit`     | (prep work — see below)         |
+| "memoize", "consolidate memory", "prune memory" | `memoize`   | `scripts/memoize.sh`            |
 
 Substitute the skill's absolute base directory for `$SKILL_DIR` in every command — it's announced at the top of this invocation.
 
@@ -239,6 +244,37 @@ Steps:
 
 The remote agent clones the snapshot repo, researches the last ~30 days of Anthropic releases and canonical Claude Code voices, and PRs `audits/YYYY-MM-DD-setup-audit.md` with prioritised deltas. It never modifies tracked files outside `audits/`.
 
+## memoize
+
+```bash
+bash "$SKILL_DIR/scripts/memoize.sh"
+```
+
+Proactive memory hygiene for `~/.claude/projects/<slug>/memory/`. Memory is reactive — entries get written when the agent notices something worth saving, but nothing prunes or consolidates. `memoize` is the deterministic maintenance pass.
+
+What it checks:
+
+1. **Index sync** — every `memory/*.md` is listed in `MEMORY.md`; every `MEMORY.md` entry points at a file that exists.
+2. **Frontmatter hygiene** — every memory has the required `name`, `description`, `type`.
+3. **Stale citations** — path-shaped tokens (anything starting with `~/`, `/Users/`, `./`, etc., or ending in a known source extension) that resolve nowhere across `~/.claude/projects/` and `~/Projects/`. Conservative on purpose — false positives cost more than misses.
+4. **Possible duplicates** — pairs of memories of the same `type` whose `name` or `description` are lexically similar (Jaccard ≥ 0.5). Flag, don't merge.
+
+Output: `<memory>/_memoize-report.md` (leading underscore so it's never indexed by `MEMORY.md`). The report is byte-stable on equal runs — running twice in a row produces an identical file.
+
+Flags:
+- `--dry-run` — print the plan + report preview, write nothing.
+- `--target=PATH` — explicit memory dir.
+
+Env knobs (mirror `snapshot.sh`): `CLAUDE_DIR`, `USER_PROJECT_KEY`, `MEMOIZE_SEARCH_ROOTS`.
+
+**Scheduled routine.** For deeper passes (conceptual duplicates, outdated facts, conflicts the lexical script can't see), wire it as a weekly `/schedule` job using `scripts/memoize-prompt.md`. Suggested config:
+- cron: `0 6 * * 0` (Sunday 06:00 UTC)
+- model: `claude-opus-4-7`
+- tools: Bash, Read, Write, Edit, Glob, Grep, Agent
+- source: the user's snapshot repo (run `harness snapshot` first)
+
+The remote agent runs `memoize.sh`, layers in semantic checks, and PRs `audits/memory/YYYY-MM-DD.md` with proposed edits. It never modifies memory directly — the user reviews and applies.
+
 ## Constraints
 
 - **Never auto-fill stack signals or user_role.** Templates have placeholders; ask the user to fill them.
@@ -267,3 +303,5 @@ The remote agent clones the snapshot repo, researches the last ~30 days of Anthr
 | `scripts/_detect_stack.py`        | Stack-signal detector — auto-fills `## Stack signals` at install time |
 | `assets/harness-check.sh.tmpl`    | Starter project pass/fail gate written by `adopt`                     |
 | `scripts/audit-prompt.md`         | Prompt template for the monthly remote-audit routine                  |
+| `scripts/memoize.sh`              | Memory consolidation pass — deterministic; emits `_memoize-report.md` |
+| `scripts/memoize-prompt.md`       | Prompt template for the weekly remote-memoize routine                 |
diff --git a/skills/harness/scripts/memoize-prompt.md b/skills/harness/scripts/memoize-prompt.md
new file mode 100644
index 0000000..f41fe8f
--- /dev/null
+++ b/skills/harness/scripts/memoize-prompt.md
@@ -0,0 +1,90 @@
+# Weekly memory consolidation — agent prompt
+
+Use this prompt as the `events[].data.message.content` when creating a remote
+routine that consolidates `~/.claude/projects/<slug>/memory/`. The routine
+clones the snapshot repo (e.g. `<you>/claude-setup`) and needs Bash, Read,
+Write, Edit, Glob, Grep, Agent.
+
+Suggested config:
+- cron: `0 6 * * 0` (Sunday 06:00 UTC)
+- model: `claude-opus-4-7`
+- tools: Bash, Read, Write, Edit, Glob, Grep, Agent
+- sources: the user's snapshot repo (must exist — run `harness snapshot` first)
+
+---
+
+You are doing the weekly memory hygiene pass against a snapshot of my Claude
+Code memory. The repo you are running in is a sanitised mirror of `~/.claude/`
+(see README.md). Memory lives in `memory/`, indexed by `memory/MEMORY.md`.
+
+## Your task
+
+1. **Run the deterministic pass.** If `scripts/memoize.sh` exists in the
+   snapshot, run it with `--target=memory`. Otherwise, replicate its checks
+   in-process: index sync (every `memory/*.md` listed in `MEMORY.md`, every
+   entry points at a real file), frontmatter hygiene (every memory has
+   `name`, `description`, `type`), stale citations (path-shaped tokens that
+   resolve nowhere), possible duplicates (same `type`, lexically similar
+   name/description).
+
+2. **Read every memory.** For each file under `memory/`, read it and form a
+   one-line gist. Group by `type`.
+
+3. **Look for the things the script can't catch:**
+   - **Conceptual duplicates** that don't share vocabulary — two `feedback`
+     memories saying the same thing in different words.
+   - **Outdated facts** — `project` memories citing deadlines, milestones,
+     stakeholders that have moved on. Check the snapshot's `CLAUDE.md` and
+     other context for currency.
+   - **Conflicting guidance** — two memories that pull in opposite directions.
+   - **Index drift** — entries in `MEMORY.md` whose one-line hook no longer
+     reflects the file's body.
+
+4. **Write the report** to `audits/memory/YYYY-MM-DD.md`. Structure:
+
+   ```markdown
+   # Memory consolidation — {{date}}
+
+   ## TL;DR
+   3-5 bullets — the highest-leverage merges, deletes, or rewrites.
+
+   ## Deterministic findings
+   Output of memoize.sh (or the equivalent in-process pass). Verbatim.
+
+   ## Conceptual duplicates
+   Pairs of memories that say the same thing differently. Propose a merge
+   target and the surviving content.
+
+   ## Outdated facts
+   Memories whose body cites stale state. Quote the line, suggest the edit.
+
+   ## Conflicts
+   Memories pulling in opposite directions. Surface the contradiction; do
+   not resolve it unilaterally.
+
+   ## Proposed edits (ordered by leverage)
+   - **What:** one-line summary
+   - **Where:** `memory/<file>.md`
+   - **Diff:** old → new (or "merge X into Y, delete X")
+   - **Why:** the signal in the body that motivates this
+
+   ## Skip-list
+   Things that looked relevant but aren't worth doing this week.
+   ```
+
+5. **Open a PR.** Branch `memoize/YYYY-MM-DD`, commit
+   `memoize: weekly memory consolidation YYYY-MM-DD`, PR title
+   `Memory consolidation — {{date}}`, PR body = TL;DR. Use `gh pr create`.
+
+## Constraints
+
+- DO NOT modify any tracked file outside `audits/memory/`. The user reviews
+  proposed edits and applies them locally — the routine never edits memory.
+- Conservative on stale citations and outdated facts. False positives cost
+  more than misses; the user reads every line of this report.
+- Prefer specific over comprehensive. 3 merges I'll do > 30 I won't.
+- If nothing material changed since last week, write a short report saying
+  so. Don't pad.
+- Concise output — the user reads in feedforward / sensors / GC vocabulary.
+
+Report length target: 400-1000 words.
diff --git a/skills/harness/scripts/memoize.sh b/skills/harness/scripts/memoize.sh
new file mode 100755
index 0000000..d39b895
--- /dev/null
+++ b/skills/harness/scripts/memoize.sh
@@ -0,0 +1,386 @@
+#!/usr/bin/env bash
+# Memoize — proactive memory hygiene for ~/.claude/projects/<slug>/memory/.
+# Read-only by default in spirit: emits a markdown report; never edits or
+# deletes a memory file. The report goes to <memory>/_memoize-report.md.
+#
+# Checks:
+#   1. Index sync     — every memory/*.md (except _*.md) is in MEMORY.md;
+#                       every MEMORY.md entry points at a real file.
+#   2. Frontmatter    — every memory has `name`, `description`, `type`.
+#   3. Stale citations — path-shaped tokens in memory bodies that resolve
+#                        nowhere across the search roots.
+#   4. Duplicates     — pairs of memories of the same `type` whose names
+#                        or descriptions are lexically close.
+#
+# Output is sorted and stable: running twice produces an identical report
+# (no timestamps in the body).
+#
+# Usage:
+#   bash memoize.sh                 # write the report
+#   bash memoize.sh --dry-run       # print the plan, write nothing
+#   bash memoize.sh --target=PATH   # explicit memory dir
+#
+# Env knobs (mirror snapshot.sh):
+#   CLAUDE_DIR=/path/to/.claude
+#   USER_PROJECT_KEY=-Users-foo
+#   MEMOIZE_SEARCH_ROOTS="$HOME/.claude/projects $HOME/Projects"
+
+set -euo pipefail
+
+DRY_RUN=0
+TARGET=""
+for arg in "$@"; do
+  case "$arg" in
+    --dry-run) DRY_RUN=1 ;;
+    --target=*) TARGET="${arg#--target=}" ;;
+    -h|--help)
+      sed -n '2,/^$/p' "$0" | sed 's/^# \{0,1\}//'
+      exit 0
+      ;;
+    *) echo "unknown flag: $arg" >&2; exit 2 ;;
+  esac
+done
+
+CLAUDE_DIR="${CLAUDE_DIR:-$HOME/.claude}"
+USER_PROJECT_KEY="${USER_PROJECT_KEY:-$(printf '%s' "$HOME" | tr '/' '-')}"
+if [ -z "$TARGET" ]; then
+  TARGET="$CLAUDE_DIR/projects/$USER_PROJECT_KEY/memory"
+fi
+SEARCH_ROOTS="${MEMOIZE_SEARCH_ROOTS:-$HOME/.claude/projects $HOME/Projects}"
+
+if [ ! -d "$TARGET" ]; then
+  echo "error: memory dir not found: $TARGET" >&2
+  echo "       set --target=PATH or USER_PROJECT_KEY" >&2
+  exit 2
+fi
+
+REPORT="$TARGET/_memoize-report.md"
+
+green() { printf '\033[32m%s\033[0m' "$1"; }
+yellow() { printf '\033[33m%s\033[0m' "$1"; }
+dim() { printf '\033[90m%s\033[0m' "$1"; }
+if ! [ -t 1 ]; then
+  green() { printf '%s' "$1"; }
+  yellow() { printf '%s' "$1"; }
+  dim() { printf '%s' "$1"; }
+fi
+
+echo "memoize — target=$TARGET"
+[ "$DRY_RUN" -eq 1 ] && echo "$(yellow '(dry-run)') no file will be written"
+echo
+
+# Resolve PYBIN once.
+if command -v python3 >/dev/null 2>&1; then
+  PYBIN=python3
+elif command -v python >/dev/null 2>&1; then
+  PYBIN=python
+else
+  echo "error: python3 not on PATH (needed for frontmatter parsing)" >&2
+  exit 2
+fi
+
+# Hand off to python for the analysis. Keeps the bash thin and the parsing safe.
+TMP="$(mktemp)"
+trap 'rm -f "$TMP"' EXIT
+
+TARGET="$TARGET" SEARCH_ROOTS="$SEARCH_ROOTS" "$PYBIN" - <<'PY' > "$TMP"
+import os, re, sys, hashlib
+from pathlib import Path
+
+target = Path(os.environ["TARGET"])
+roots = [Path(p) for p in os.environ["SEARCH_ROOTS"].split() if p]
+
+REPORT_NAME = "_memoize-report.md"
+INDEX = "MEMORY.md"
+REQUIRED = ("name", "description", "type")
+
+def memory_files():
+    out = []
+    for p in sorted(target.iterdir()):
+        if not p.is_file():
+            continue
+        if p.suffix != ".md":
+            continue
+        if p.name == INDEX or p.name.startswith("_"):
+            continue
+        out.append(p)
+    return out
+
+def parse_frontmatter(text):
+    if not text.startswith("---\n"):
+        return None, text
+    end = text.find("\n---\n", 4)
+    if end == -1:
+        return None, text
+    block = text[4:end]
+    body = text[end+5:]
+    fm = {}
+    for line in block.splitlines():
+        if ":" not in line:
+            continue
+        k, _, v = line.partition(":")
+        fm[k.strip()] = v.strip()
+    return fm, body
+
+def index_entries():
+    p = target / INDEX
+    if not p.exists():
+        return [], None
+    entries = []
+    pat = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
+    raw = p.read_text(encoding="utf-8")
+    for line in raw.splitlines():
+        m = pat.search(line)
+        if m:
+            entries.append((m.group(1), m.group(2)))
+    return entries, raw
+
+def shorten(s, n=80):
+    s = s.strip().replace("\n", " ")
+    return s if len(s) <= n else s[:n-1] + "…"
+
+# ---- 1. Index sync ----
+files = memory_files()
+file_names = {p.name for p in files}
+entries, _ = index_entries()
+indexed = {name for _, name in entries}
+
+missing_in_index = sorted(file_names - indexed)
+broken_in_index = sorted({name for _, name in entries if name not in file_names})
+
+# ---- 2. Frontmatter ----
+fm_issues = []  # (filename, problem)
+parsed = {}     # filename -> (fm, body)
+for p in files:
+    text = p.read_text(encoding="utf-8")
+    fm, body = parse_frontmatter(text)
+    parsed[p.name] = (fm, body)
+    if fm is None:
+        fm_issues.append((p.name, "no frontmatter block"))
+        continue
+    for k in REQUIRED:
+        if not fm.get(k):
+            fm_issues.append((p.name, f"missing or empty `{k}`"))
+fm_issues.sort()
+
+# ---- 3. Stale citations ----
+# Conservative — flag a token only if it's *unambiguously* a real filesystem
+# path. Two ways to qualify:
+#   (a) starts with a hard prefix that's almost always a path: ~/, /Users/,
+#       /home/, /etc/, /opt/, /var/, /tmp/, ./, ../
+#   (b) ends in a recognised source-file extension.
+# This drops slash-commands (/verify, /grade), brace expansions
+# ({a,b,c}), regex-ish strings, and similar false positives.
+PATH_PREFIXES = ("~/", "/Users/", "/home/", "/etc/", "/opt/", "/var/", "/tmp/",
+                 "./", "../")
+PATH_EXTS = (".md", ".sh", ".py", ".ts", ".tsx", ".js", ".jsx", ".json",
+             ".yml", ".yaml", ".toml", ".php", ".go", ".rb", ".rs",
+             ".html", ".css", ".sql", ".lock", ".env")
+path_pat = re.compile(r"`(?P<bt>[^`\n]+)`|(?P<bare>(?:~|\.{1,2})?/[^\s`)\]\"',]+)")
+known_existing_cache = {}
+
+def looks_like_path(tok):
+    if not tok or "/" not in tok:
+        return False
+    if tok.startswith(("http://", "https://", "//")):
+        return False
+    if tok.startswith("<") or tok.endswith(">"):
+        return False
+    if "{" in tok or "}" in tok or "*" in tok:
+        return False
+    if tok.startswith(PATH_PREFIXES):
+        return True
+    if any(tok.endswith(ext) for ext in PATH_EXTS):
+        return True
+    return False
+
+def candidate_paths(text):
+    out = []
+    for m in path_pat.finditer(text):
+        tok = (m.group("bt") or m.group("bare") or "").strip()
+        tok = tok.rstrip(".,;:)")
+        if looks_like_path(tok):
+            out.append(tok)
+    return out
+
+def resolves(tok):
+    if tok in known_existing_cache:
+        return known_existing_cache[tok]
+    # Expand ~/
+    if tok.startswith("~/"):
+        candidates = [Path(os.path.expanduser(tok))]
+    elif tok.startswith("/"):
+        candidates = [Path(tok)]
+    elif tok.startswith("./"):
+        candidates = [Path(tok)]
+    else:
+        # Treat as a relative project-ish path: try under each search root.
+        candidates = [r / tok for r in roots]
+        # also try as a basename match under any root (one level only)
+        # (skipped — too broad; conservatism wins)
+    found = any(c.exists() for c in candidates)
+    if not found:
+        # last-ditch: a basename-only suffix match under search roots, depth 4
+        base = tok.rstrip("/").split("/")[-1]
+        if base and len(base) > 2:
+            for r in roots:
+                if not r.exists():
+                    continue
+                # single-shot find: walk up to depth 4
+                hit = False
+                for dirpath, dirnames, filenames in os.walk(r):
+                    depth = len(Path(dirpath).relative_to(r).parts)
+                    if depth > 4:
+                        dirnames[:] = []
+                        continue
+                    if base in filenames or base in dirnames:
+                        hit = True
+                        break
+                if hit:
+                    found = True
+                    break
+    known_existing_cache[tok] = found
+    return found
+
+stale = []  # (filename, token)
+for p in files:
+    fm, body = parsed[p.name]
+    seen = set()
+    for tok in candidate_paths(body or ""):
+        if tok in seen:
+            continue
+        seen.add(tok)
+        if not resolves(tok):
+            stale.append((p.name, tok))
+stale.sort()
+
+# ---- 4. Duplicate detection (lexical, by type) ----
+def normalize(s):
+    s = (s or "").lower()
+    s = re.sub(r"[^a-z0-9]+", " ", s)
+    return set(t for t in s.split() if len(t) > 2)
+
+def jaccard(a, b):
+    if not a or not b:
+        return 0.0
+    return len(a & b) / len(a | b)
+
+by_type = {}
+for p in files:
+    fm, _ = parsed[p.name]
+    t = (fm or {}).get("type", "<unknown>")
+    by_type.setdefault(t, []).append((p.name, fm or {}))
+
+dupes = []  # (type, file_a, file_b, score)
+for t, items in by_type.items():
+    items.sort()
+    for i in range(len(items)):
+        for j in range(i+1, len(items)):
+            a_name, a_fm = items[i]
+            b_name, b_fm = items[j]
+            score = max(
+                jaccard(normalize(a_fm.get("name")), normalize(b_fm.get("name"))),
+                jaccard(normalize(a_fm.get("description")), normalize(b_fm.get("description"))),
+            )
+            if score >= 0.5:
+                dupes.append((t, a_name, b_name, round(score, 2)))
+dupes.sort()
+
+# ---- Render ----
+out = []
+def section(title):
+    out.append(f"## {title}\n")
+
+out.append("# Memory consolidation report\n")
+out.append("Generated by `harness memoize`. Report-only — no memory files were modified.\n")
+out.append(f"Memory dir: `{target}`\n")
+out.append(f"Files scanned: {len(files)}\n")
+out.append("")
+
+section("Index sync")
+if not missing_in_index and not broken_in_index:
+    out.append("OK — every memory file is indexed and every index entry resolves.\n")
+else:
+    if missing_in_index:
+        out.append("**Files not listed in MEMORY.md:**\n")
+        for n in missing_in_index:
+            out.append(f"- `{n}`")
+        out.append("")
+    if broken_in_index:
+        out.append("**MEMORY.md entries pointing at missing files:**\n")
+        for n in broken_in_index:
+            out.append(f"- `{n}`")
+        out.append("")
+out.append("")
+
+section("Frontmatter hygiene")
+if not fm_issues:
+    out.append("OK — every memory has `name`, `description`, and `type`.\n")
+else:
+    for name, problem in fm_issues:
+        out.append(f"- `{name}` — {problem}")
+    out.append("")
+out.append("")
+
+section("Stale citations")
+if not stale:
+    out.append("OK — no path-shaped tokens that fail to resolve under the search roots.\n")
+else:
+    out.append(f"Search roots: `{os.environ['SEARCH_ROOTS']}`\n")
+    out.append("Conservative — only flags tokens with an explicit `~/`, `/`, or `./` prefix or backtick-wrapped path. False positives possible; verify before acting.\n")
+    cur = None
+    for name, tok in stale:
+        if name != cur:
+            out.append(f"- `{name}`")
+            cur = name
+        out.append(f"  - `{tok}`")
+    out.append("")
+out.append("")
+
+section("Possible duplicates")
+if not dupes:
+    out.append("OK — no near-duplicate name/description pairs within a `type`.\n")
+else:
+    out.append("Lexical Jaccard ≥ 0.5 on `name` or `description`. Review and decide whether to merge.\n")
+    for t, a, b, score in dupes:
+        out.append(f"- _{t}_ — `{a}` ↔ `{b}` (score {score})")
+    out.append("")
+out.append("")
+
+# Final summary line
+total_findings = len(missing_in_index) + len(broken_in_index) + len(fm_issues) + len(stale) + len(dupes)
+out.append("---")
+out.append(f"Findings: {total_findings} "
+           f"(index {len(missing_in_index)+len(broken_in_index)}, "
+           f"frontmatter {len(fm_issues)}, "
+           f"stale {len(stale)}, "
+           f"duplicates {len(dupes)})")
+
+text = "\n".join(out).rstrip() + "\n"
+sys.stdout.write(text)
+
+# Also emit a one-line summary to stderr for the bash wrapper.
+sys.stderr.write(f"SUMMARY findings={total_findings}\n")
+PY
+
+# Read the python summary off stderr — but we already piped to TMP only, so
+# re-run is wasteful. Cheaper: tail the report's last line for the summary.
+SUMMARY="$(tail -n 1 "$TMP")"
+
+if [ "$DRY_RUN" -eq 1 ]; then
+  dim 'plan:'; echo
+  echo "  would write: $REPORT"
+  echo
+  dim '── report preview ──'; echo
+  cat "$TMP"
+  dim '── end ──'; echo
+  exit 0
+fi
+
+# Atomic write so the report is byte-stable on equal runs.
+mkdir -p "$TARGET"
+mv "$TMP" "$REPORT"
+trap - EXIT
+
+echo "$(green 'wrote:') $REPORT"
+echo "  $SUMMARY"

From de2d1e7dcf4d0ed43b28567f875142beb7969f91 Mon Sep 17 00:00:00 2001
From: Marlin Forbes <marlinf@datashaman.com>
Date: Sat, 2 May 2026 07:39:32 +0200
Subject: [PATCH 2/3] fix(harness): address memoize PR review

- CLAUDE_DIR override now cascades to default search roots
- MEMOIZE_SEARCH_ROOTS uses PATH-style colon separator (paths with spaces)
- ./ and ../ tokens resolve under search roots, not CWD
- prune .git, node_modules, vendor, etc. during fallback walk
- drop unread SUMMARY stderr emission
- prompt drops the local-script branch (snapshot has no scripts)
- prompt index-check explicitly skips MEMORY.md and _*.md
- prompt clarifies stale-citation is local-only
- SKILL.md clarifies report-vs-memory contract
---
 .mcp.json                                |  8 ++++
 skills/harness/SKILL.md                  | 11 +++--
 skills/harness/scripts/memoize-prompt.md | 57 ++++++++++++++++--------
 skills/harness/scripts/memoize.sh        | 41 +++++++++--------
 4 files changed, 75 insertions(+), 42 deletions(-)
 create mode 100644 .mcp.json

diff --git a/.mcp.json b/.mcp.json
new file mode 100644
index 0000000..b05a925
--- /dev/null
+++ b/.mcp.json
@@ -0,0 +1,8 @@
+{
+  "mcpServers": {
+    "atlassian": {
+      "type": "http",
+      "url": "https://mcp.atlassian.com/v1/mcp"
+    }
+  }
+}
\ No newline at end of file
diff --git a/skills/harness/SKILL.md b/skills/harness/SKILL.md
index 6c0eab5..5a8f337 100644
--- a/skills/harness/SKILL.md
+++ b/skills/harness/SKILL.md
@@ -259,21 +259,24 @@ What it checks:
 3. **Stale citations** — path-shaped tokens (anything starting with `~/`, `/Users/`, `./`, etc., or ending in a known source extension) that resolve nowhere across `~/.claude/projects/` and `~/Projects/`. Conservative on purpose — false positives cost more than misses.
 4. **Possible duplicates** — pairs of memories of the same `type` whose `name` or `description` are lexically similar (Jaccard ≥ 0.5). Flag, don't merge.
 
-Output: `<memory>/_memoize-report.md` (leading underscore so it's never indexed by `MEMORY.md`). The report is byte-stable on equal runs — running twice in a row produces an identical file.
+Output: a single file at `<memory>/_memoize-report.md`. The leading underscore is the contract — `MEMORY.md` indexing rules and the remote routine both ignore `_*.md`, so the report itself never gets treated as a memory entry. The report is byte-stable on equal runs (two consecutive invocations produce an identical file).
 
 Flags:
 - `--dry-run` — print the plan + report preview, write nothing.
 - `--target=PATH` — explicit memory dir.
 
-Env knobs (mirror `snapshot.sh`): `CLAUDE_DIR`, `USER_PROJECT_KEY`, `MEMOIZE_SEARCH_ROOTS`.
+Env knobs (mirror `snapshot.sh`):
+- `CLAUDE_DIR` — root of the Claude Code config dir. Search-root defaults track this, so a custom `CLAUDE_DIR` cascades correctly.
+- `USER_PROJECT_KEY` — the slug under `<CLAUDE_DIR>/projects/`.
+- `MEMOIZE_SEARCH_ROOTS` — colon-separated (PATH-style, supports paths with spaces) list of roots to resolve stale citations against. Defaults to `<CLAUDE_DIR>/projects:$HOME/Projects`.
 
-**Scheduled routine.** For deeper passes (conceptual duplicates, outdated facts, conflicts the lexical script can't see), wire it as a weekly `/schedule` job using `scripts/memoize-prompt.md`. Suggested config:
+**Scheduled routine.** For the conceptual drift the lexical script can't see (semantic duplicates, outdated facts, conflicting guidance), wire a weekly `/schedule` job using `scripts/memoize-prompt.md`. Suggested config:
 - cron: `0 6 * * 0` (Sunday 06:00 UTC)
 - model: `claude-opus-4-7`
 - tools: Bash, Read, Write, Edit, Glob, Grep, Agent
 - source: the user's snapshot repo (run `harness snapshot` first)
 
-The remote agent runs `memoize.sh`, layers in semantic checks, and PRs `audits/memory/YYYY-MM-DD.md` with proposed edits. It never modifies memory directly — the user reviews and applies.
+Scope: the snapshot repo doesn't mirror harness scripts or local search roots, so the remote agent does its own in-process structural pass (index sync, frontmatter) and adds the semantic checks. Stale-citation analysis stays local-only — the search roots aren't available remotely. The remote agent PRs `audits/memory/YYYY-MM-DD.md` with proposed edits and never modifies any memory entry.
 
 ## Constraints
 
diff --git a/skills/harness/scripts/memoize-prompt.md b/skills/harness/scripts/memoize-prompt.md
index f41fe8f..1e1cc6a 100644
--- a/skills/harness/scripts/memoize-prompt.md
+++ b/skills/harness/scripts/memoize-prompt.md
@@ -11,34 +11,49 @@ Suggested config:
 - tools: Bash, Read, Write, Edit, Glob, Grep, Agent
 - sources: the user's snapshot repo (must exist — run `harness snapshot` first)
 
+## Scope note
+
+The local `memoize.sh` script does four checks: index sync, frontmatter
+hygiene, stale citations, and lexical duplicates. Of these, **stale
+citations cannot run remotely** — the search roots (`~/.claude/projects`,
+`~/Projects`) are local-machine state that the snapshot repo does not
+mirror. The remote routine focuses on the **conceptual** drift the local
+script can't see (semantic duplicates, outdated facts, conflicts), and
+re-runs the cheap structural checks (index, frontmatter) directly against
+the snapshot.
+
 ---
 
 You are doing the weekly memory hygiene pass against a snapshot of my Claude
 Code memory. The repo you are running in is a sanitised mirror of `~/.claude/`
 (see README.md). Memory lives in `memory/`, indexed by `memory/MEMORY.md`.
+The snapshot does **not** contain harness scripts or local search roots.
 
 ## Your task
 
-1. **Run the deterministic pass.** If `scripts/memoize.sh` exists in the
-   snapshot, run it with `--target=memory`. Otherwise, replicate its checks
-   in-process: index sync (every `memory/*.md` listed in `MEMORY.md`, every
-   entry points at a real file), frontmatter hygiene (every memory has
-   `name`, `description`, `type`), stale citations (path-shaped tokens that
-   resolve nowhere), possible duplicates (same `type`, lexically similar
-   name/description).
+1. **Structural checks (in-process).** For files under `memory/`, applying
+   `memory/MEMORY.md` as the index:
+   - **Index sync** — every `memory/*.md` whose filename does NOT start with
+     `_` and is NOT `MEMORY.md` itself must appear as an entry in
+     `MEMORY.md`. Every `MEMORY.md` entry must point at a real file.
+   - **Frontmatter hygiene** — every memory file must start with a `---`
+     YAML frontmatter block containing non-empty `name`, `description`,
+     and `type` fields.
 
-2. **Read every memory.** For each file under `memory/`, read it and form a
-   one-line gist. Group by `type`.
+2. **Read every memory.** For each `memory/*.md` file (skipping `MEMORY.md`
+   and any `_*.md` such as the consolidation report itself), read it and
+   form a one-line gist. Group by `type`.
 
-3. **Look for the things the script can't catch:**
+3. **Conceptual checks the structural pass can't catch:**
    - **Conceptual duplicates** that don't share vocabulary — two `feedback`
      memories saying the same thing in different words.
    - **Outdated facts** — `project` memories citing deadlines, milestones,
-     stakeholders that have moved on. Check the snapshot's `CLAUDE.md` and
-     other context for currency.
-   - **Conflicting guidance** — two memories that pull in opposite directions.
-   - **Index drift** — entries in `MEMORY.md` whose one-line hook no longer
-     reflects the file's body.
+     stakeholders that have moved on. Cross-check against the snapshot's
+     `CLAUDE.md` and other context for currency.
+   - **Conflicting guidance** — two memories that pull in opposite
+     directions.
+   - **Index drift** — entries in `MEMORY.md` whose one-line hook no
+     longer reflects the file's body.
 
 4. **Write the report** to `audits/memory/YYYY-MM-DD.md`. Structure:
 
@@ -48,8 +63,8 @@ Code memory. The repo you are running in is a sanitised mirror of `~/.claude/`
    ## TL;DR
    3-5 bullets — the highest-leverage merges, deletes, or rewrites.
 
-   ## Deterministic findings
-   Output of memoize.sh (or the equivalent in-process pass). Verbatim.
+   ## Structural findings
+   Index sync + frontmatter hygiene issues. Per-file bullets.
 
    ## Conceptual duplicates
    Pairs of memories that say the same thing differently. Propose a merge
@@ -80,8 +95,12 @@ Code memory. The repo you are running in is a sanitised mirror of `~/.claude/`
 
 - DO NOT modify any tracked file outside `audits/memory/`. The user reviews
   proposed edits and applies them locally — the routine never edits memory.
-- Conservative on stale citations and outdated facts. False positives cost
-  more than misses; the user reads every line of this report.
+- Skip files whose names start with `_` (e.g. `_memoize-report.md` if it
+  ever lands in the snapshot) and `MEMORY.md` when iterating "every memory".
+- Stale-citation analysis is local-only; do not attempt a snapshot
+  equivalent — false positives drown the signal.
+- Conservative on outdated facts. False positives cost more than misses;
+  the user reads every line of this report.
 - Prefer specific over comprehensive. 3 merges I'll do > 30 I won't.
 - If nothing material changed since last week, write a short report saying
   so. Don't pad.
diff --git a/skills/harness/scripts/memoize.sh b/skills/harness/scripts/memoize.sh
index d39b895..7b4d5ee 100755
--- a/skills/harness/scripts/memoize.sh
+++ b/skills/harness/scripts/memoize.sh
@@ -23,7 +23,9 @@
 # Env knobs (mirror snapshot.sh):
 #   CLAUDE_DIR=/path/to/.claude
 #   USER_PROJECT_KEY=-Users-foo
-#   MEMOIZE_SEARCH_ROOTS="$HOME/.claude/projects $HOME/Projects"
+#   MEMOIZE_SEARCH_ROOTS="$CLAUDE_DIR/projects:$HOME/Projects"
+#     Colon-separated (like PATH) so paths-with-spaces work. Defaults track
+#     $CLAUDE_DIR so an override cascades correctly.
 
 set -euo pipefail
 
@@ -46,7 +48,7 @@ USER_PROJECT_KEY="${USER_PROJECT_KEY:-$(printf '%s' "$HOME" | tr '/' '-')}"
 if [ -z "$TARGET" ]; then
   TARGET="$CLAUDE_DIR/projects/$USER_PROJECT_KEY/memory"
 fi
-SEARCH_ROOTS="${MEMOIZE_SEARCH_ROOTS:-$HOME/.claude/projects $HOME/Projects}"
+SEARCH_ROOTS="${MEMOIZE_SEARCH_ROOTS:-$CLAUDE_DIR/projects:$HOME/Projects}"
 
 if [ ! -d "$TARGET" ]; then
   echo "error: memory dir not found: $TARGET" >&2
@@ -88,7 +90,9 @@ import os, re, sys, hashlib
 from pathlib import Path
 
 target = Path(os.environ["TARGET"])
-roots = [Path(p) for p in os.environ["SEARCH_ROOTS"].split() if p]
+roots = [Path(p) for p in os.environ["SEARCH_ROOTS"].split(":") if p]
+PRUNE_DIRS = {".git", "node_modules", "vendor", "__pycache__", ".venv",
+              "venv", "target", "dist", "build", ".next", ".cache"}
 
 REPORT_NAME = "_memoize-report.md"
 INDEX = "MEMORY.md"
@@ -165,11 +169,13 @@ fm_issues.sort()
 
 # ---- 3. Stale citations ----
 # Conservative — flag a token only if it's *unambiguously* a real filesystem
-# path. Two ways to qualify:
-#   (a) starts with a hard prefix that's almost always a path: ~/, /Users/,
+# path. Token must contain a `/` AND either:
+#   (a) start with a hard prefix that's almost always a path: ~/, /Users/,
 #       /home/, /etc/, /opt/, /var/, /tmp/, ./, ../
-#   (b) ends in a recognised source-file extension.
-# This drops slash-commands (/verify, /grade), brace expansions
+#   (b) end in a recognised source-file extension.
+# Bare basenames like `settings.json` are *intentionally* skipped — too many
+# false positives (string literals, log lines, prose mentions).
+# This also drops slash-commands (/verify, /grade), brace expansions
 # ({a,b,c}), regex-ish strings, and similar false positives.
 PATH_PREFIXES = ("~/", "/Users/", "/home/", "/etc/", "/opt/", "/var/", "/tmp/",
                  "./", "../")
@@ -211,24 +217,26 @@ def resolves(tok):
         candidates = [Path(os.path.expanduser(tok))]
     elif tok.startswith("/"):
         candidates = [Path(tok)]
-    elif tok.startswith("./"):
-        candidates = [Path(tok)]
+    elif tok.startswith(("./", "../")):
+        # Don't resolve against CWD — that makes results depend on where
+        # the script was invoked from. Resolve under each search root.
+        rel = tok.lstrip(".").lstrip("/")
+        candidates = [r / rel for r in roots]
     else:
-        # Treat as a relative project-ish path: try under each search root.
+        # Bare relative path (no leading marker): try under each search root.
         candidates = [r / tok for r in roots]
-        # also try as a basename match under any root (one level only)
-        # (skipped — too broad; conservatism wins)
     found = any(c.exists() for c in candidates)
     if not found:
-        # last-ditch: a basename-only suffix match under search roots, depth 4
+        # last-ditch: a basename suffix match under search roots, depth 4,
+        # pruning heavy dirs.
         base = tok.rstrip("/").split("/")[-1]
         if base and len(base) > 2:
             for r in roots:
                 if not r.exists():
                     continue
-                # single-shot find: walk up to depth 4
                 hit = False
                 for dirpath, dirnames, filenames in os.walk(r):
+                    dirnames[:] = [d for d in dirnames if d not in PRUNE_DIRS]
                     depth = len(Path(dirpath).relative_to(r).parts)
                     if depth > 4:
                         dirnames[:] = []
@@ -358,13 +366,8 @@ out.append(f"Findings: {total_findings} "
 
 text = "\n".join(out).rstrip() + "\n"
 sys.stdout.write(text)
-
-# Also emit a one-line summary to stderr for the bash wrapper.
-sys.stderr.write(f"SUMMARY findings={total_findings}\n")
 PY
 
-# Read the python summary off stderr — but we already piped to TMP only, so
-# re-run is wasteful. Cheaper: tail the report's last line for the summary.
 SUMMARY="$(tail -n 1 "$TMP")"
 
 if [ "$DRY_RUN" -eq 1 ]; then

From 151e722001f3e31130c15930a790aa39f1c3ec11 Mon Sep 17 00:00:00 2001
From: Marlin Forbes <marlinf@datashaman.com>
Date: Sat, 2 May 2026 07:39:45 +0200
Subject: [PATCH 3/3] chore: drop stray .mcp.json from PR

---
 .mcp.json | 8 --------
 1 file changed, 8 deletions(-)
 delete mode 100644 .mcp.json

diff --git a/.mcp.json b/.mcp.json
deleted file mode 100644
index b05a925..0000000
--- a/.mcp.json
+++ /dev/null
@@ -1,8 +0,0 @@
-{
-  "mcpServers": {
-    "atlassian": {
-      "type": "http",
-      "url": "https://mcp.atlassian.com/v1/mcp"
-    }
-  }
-}
\ No newline at end of file