mcp(refactor[instructions]): collapse base instructions to a 117-word card

tony · tony · commit 72abdae721ed · 2026-04-26T18:06:11.000-05:00
Phase 3 of the BASE_INSTRUCTIONS slim-down: delete the six
gap-explainer / positive-guidance segments now that their content
lives in tool descriptions (Phase 1). The card shrinks from a
305-word, 6-paragraph monolith to a 117-word "three handles"
overview that answers (1) what is this server and (2) where do I
look for the rest.

Server.py changes
-----------------
* Delete ``_INSTR_HIERARCHY``, ``_INSTR_METADATA_VS_CONTENT``,
  ``_INSTR_READ_TOOLS``, ``_INSTR_WAIT_NOT_POLL``, ``_INSTR_HOOKS_GAP``,
  ``_INSTR_BUFFERS_GAP``.
* Add ``_INSTR_CARD`` (server identity, hierarchy, socket_name
  exception) and ``_INSTR_HANDLES`` (Tools / Resources / Prompts).
* Rewrite the module-level comment to name the cross-cutting vs.
  tool-specific boundary explicitly, pointing future contributors at
  tool descriptions for any rule that names a specific tool.

Live ``_build_instructions(mutating)`` output drops from 357 words to
215 — comfortably inside MCP's 150-300 word recommendation for global
instructions.

Tests
-----
* Delete the six standalone ``test_base_instructions_*`` functions —
  every substring they pinned has either moved into a tool description
  (covered by Phase 1's ``test_tool_description_includes``) or been
  deliberately removed from the card.
* Add ``CardContract`` ``NamedTuple`` + parametrized
  ``test_card_contracts`` with three rows: ``server_identity``,
  ``socket_name_exception`` (with ``must_exclude=("All tools accept",)``
  guarding against drift back to the pre-refactor lie), and
  ``three_handles``.
* Add ``test_card_length_budget`` enforcing a soft 200-word ceiling so
  the card cannot regress into the monolith it just shrank from.
* The ``BuildInstructionsFixture`` matrix is unchanged — its
  assertions cover the dynamic safety / agent-context blocks of
  ``_build_instructions``, not ``_BASE_INSTRUCTIONS``, and those still
  build identically.

Docs
----
``docs/topics/prompting.md`` previously excerpted only the first two
old segments; updated to mirror the new slim card so readers see what
agents actually receive.
diff --git a/docs/topics/prompting.md b/docs/topics/prompting.md
@@ -11,19 +11,22 @@ Every MCP client receives these instructions when connecting to the libtmux-mcp
 ```{code-block} text
 :class: server-prompt
 
-libtmux MCP server for programmatic tmux control. tmux hierarchy:
-Server > Session > Window > Pane. Use pane_id (e.g. '%1') as the
-preferred targeting method - it is globally unique within a tmux server.
-Use send_keys to execute commands and capture_pane to read output. All
-tools accept an optional socket_name parameter for multi-server support
-(defaults to LIBTMUX_SOCKET env var).
-
-IMPORTANT — metadata vs content: list_windows, list_panes, and
-list_sessions only search metadata (names, IDs, current command). To
-find text that is actually visible in terminals — when users ask what
-panes 'contain', 'mention', 'show', or 'have' — use search_panes to
-search across all pane contents, or list_panes + capture_pane on each
-pane for manual inspection.
+libtmux MCP server: programmatic tmux control. tmux hierarchy is
+Server > Session > Window > Pane; every pane has a globally unique
+pane_id like %1 — prefer it over name/index for targeting. Targeted
+tools accept an optional socket_name (defaults to LIBTMUX_SOCKET);
+list_servers discovers sockets via TMUX_TMPDIR / extra_socket_paths
+and is the documented socket_name exception.
+
+Three handles cover everything the agent needs:
+- Tools — call list_tools; per-tool descriptions tell you which to
+  prefer (e.g. snapshot_pane over capture_pane + get_pane_info,
+  wait_for_text over capture_pane in a retry loop, search_panes over
+  list_panes when the user says "panes that contain X").
+- Resources (tmux://) — browseable hierarchy plus reference cards
+  (format strings).
+- Prompts — packaged workflows: run_and_wait, diagnose_failing_pane,
+  build_dev_workspace, interrupt_gracefully.
 ```
 
 The server also dynamically adds:
diff --git a/src/libtmux_mcp/server.py b/src/libtmux_mcp/server.py
@@ -43,93 +43,45 @@
 _ServerCacheKey: t.TypeAlias = tuple[str | None, str | None, str | None]
 
 # ---------------------------------------------------------------------------
-# _BASE_INSTRUCTIONS — composed from named segments.
+# _BASE_INSTRUCTIONS — slim "three handles" card.
 #
-# The string handed to FastMCP grew organically from "what does this server
-# do?" toward a hybrid of positive guidance (HIERARCHY, READ_TOOLS,
-# WAIT_NOT_POLL) and *gap-explainers* (HOOKS_GAP, BUFFERS_GAP) that document
-# why a tool the agent might expect is absent. Splitting into named
-# constants keeps additions deliberate: when a new ``_GAP`` segment feels
-# tempting, prefer first to push the explanation into the relevant tool's
-# docstring/description (where the agent encounters it at call time) and
-# only fall back to a server-level segment when the gap is *server-shaped*
-# (e.g. an entire tool family is intentionally missing).
+# The card answers two questions the agent has at session start:
+# (1) what is this server, and (2) where do I look for the rest? It points
+# at tools, resources, and prompts — and that's it. Tool-specific rules
+# (which tool to prefer, what's intentionally not exposed and why) live in
+# the relevant tool's docstring or ``description=`` override, where the
+# agent reads them on every ``list_tools`` call instead of parsing them
+# out of a one-shot prompt that has long since rolled out of context.
 #
-# Output text is byte-identical to the previous monolith; tests assert on
-# substrings of ``_BASE_INSTRUCTIONS``, so keeping the join shape stable
-# matters.
+# When in doubt about adding text here, ask: is this rule cross-cutting
+# (about the server as a whole) or tool-specific (about when to call X
+# vs Y)? Cross-cutting belongs in the card; tool-specific belongs in the
+# tool description. ``test_card_length_budget`` enforces a soft 200-word
+# ceiling against creeping re-bloat.
 # ---------------------------------------------------------------------------
 
-_INSTR_HIERARCHY = (
-    "libtmux MCP server for programmatic tmux control. "
-    "tmux hierarchy: Server > Session > Window > Pane. "
-    "Use pane_id (e.g. '%1') as the preferred targeting method - "
-    "it is globally unique within a tmux server. "
-    "Use send_keys to execute commands and capture_pane to read output. "
-    "Targeted tmux tools accept an optional socket_name parameter "
-    "(defaults to LIBTMUX_SOCKET env var); list_servers discovers "
-    "sockets via TMUX_TMPDIR plus optional extra_socket_paths instead."
+_INSTR_CARD = (
+    "libtmux MCP server: programmatic tmux control. tmux hierarchy is "
+    "Server > Session > Window > Pane; every pane has a globally unique "
+    "pane_id like %1 — prefer it over name/index for targeting. Targeted "
+    "tools accept an optional socket_name (defaults to LIBTMUX_SOCKET); "
+    "list_servers discovers sockets via TMUX_TMPDIR / extra_socket_paths "
+    "and is the documented socket_name exception."
 )
 
-_INSTR_METADATA_VS_CONTENT = (
-    "IMPORTANT — metadata vs content: list_windows, list_panes, and "
-    "list_sessions only search metadata (names, IDs, current command). "
-    "To find text that is actually visible in terminals — when users ask "
-    "what panes 'contain', 'mention', 'show', or 'have' — use "
-    "search_panes to search across all pane contents, or list_panes + "
-    "capture_pane on each pane for manual inspection."
+_INSTR_HANDLES = (
+    "Three handles cover everything the agent needs:\n"
+    "- Tools — call list_tools; per-tool descriptions tell you which to "
+    "prefer (e.g. snapshot_pane over capture_pane + get_pane_info, "
+    "wait_for_text over capture_pane in a retry loop, search_panes over "
+    'list_panes when the user says "panes that contain X").\n'
+    "- Resources (tmux://) — browseable hierarchy plus reference cards "
+    "(format strings).\n"
+    "- Prompts — packaged workflows: run_and_wait, diagnose_failing_pane, "
+    "build_dev_workspace, interrupt_gracefully."
 )
 
-_INSTR_READ_TOOLS = (
-    "READ TOOLS TO PREFER: snapshot_pane returns pane content plus "
-    "cursor position, mode, and scroll state in one call — use it "
-    "instead of capture_pane + get_pane_info when you need context. "
-    "display_message evaluates a tmux format string (e.g. "
-    "'#{pane_current_command}', '#{session_name}') against a target "
-    "and returns the expanded value — cheaper than parsing captured "
-    "output. (The tool is named after the tmux 'display-message -p' "
-    "verb it wraps; its MCP title is 'Evaluate tmux Format String'.)"
-)
-
-_INSTR_WAIT_NOT_POLL = (
-    "WAIT, DON'T POLL: for 'run command, wait for output' workflows "
-    "use wait_for_text (matches text/regex on a pane) or "
-    "wait_for_content_change (waits for any change). These block "
-    "server-side until the condition is met or the timeout expires, "
-    "which is dramatically cheaper in agent turns than capture_pane "
-    "in a retry loop."
-)
-
-#: Gap-explainer: write-hook tools are intentionally absent. See module
-#: comment above for when to add another ``_GAP`` segment vs. push the
-#: explanation into a tool description.
-_INSTR_HOOKS_GAP = (
-    "HOOKS ARE READ-ONLY: inspect via show_hooks / show_hook. Write-hook "
-    "tools are intentionally not exposed — tmux hooks survive process "
-    "death, so they belong in your tmux config file, not a transient "
-    "MCP session."
-)
-
-#: Gap-explainer: ``list_buffers`` is intentionally absent because tmux
-#: buffers can include OS clipboard history. See module comment above.
-_INSTR_BUFFERS_GAP = (
-    "BUFFERS: load_buffer stages content, paste_buffer delivers it into "
-    "a pane, delete_buffer removes the staged buffer. Track owned "
-    "buffers via the BufferRef returned from load_buffer — there is no "
-    "list_buffers tool because tmux buffers may include OS clipboard "
-    "history (passwords, private snippets)."
-)
-
-_BASE_INSTRUCTIONS = "\n\n".join(
-    (
-        _INSTR_HIERARCHY,
-        _INSTR_METADATA_VS_CONTENT,
-        _INSTR_READ_TOOLS,
-        _INSTR_WAIT_NOT_POLL,
-        _INSTR_HOOKS_GAP,
-        _INSTR_BUFFERS_GAP,
-    )
-)
+_BASE_INSTRUCTIONS = "\n\n".join((_INSTR_CARD, _INSTR_HANDLES))
 
 
 def _build_instructions(safety_level: str = TAG_MUTATING) -> str:
diff --git a/tests/test_server.py b/tests/test_server.py
@@ -129,65 +129,93 @@ def test_build_instructions(
         assert f"Safety level: {expect_safety_in_text}" in result
 
 
-def test_base_instructions_content() -> None:
-    """_BASE_INSTRUCTIONS contains key guidance for the LLM."""
-    assert "tmux hierarchy" in _BASE_INSTRUCTIONS
-    assert "pane_id" in _BASE_INSTRUCTIONS
-    assert "search_panes" in _BASE_INSTRUCTIONS
-    assert "metadata vs content" in _BASE_INSTRUCTIONS
-
-
-def test_base_instructions_surface_flagship_read_tools() -> None:
-    """_BASE_INSTRUCTIONS mentions the richer read tools by name.
-
-    ``display_message`` (tmux format queries) and ``snapshot_pane``
-    (content + metadata in one call) are strictly more expressive than
-    ``capture_pane`` for most contexts, but agents that never see them
-    named in the instructions default to ``capture_pane`` + a follow-up
-    ``get_pane_info``. Naming both explicitly changes that default.
+class CardContract(t.NamedTuple):
+    """Contract about what ``_BASE_INSTRUCTIONS`` must / must not contain.
+
+    The slim card is the public-facing server prompt — every MCP client
+    that connects gets it. ``must_include`` pins the substrings agents
+    rely on to orient (server identity, socket_name exception, three
+    handles); ``must_exclude`` pins the deleted-pre-refactor phrasing so
+    a future drift back to the lie ("All tools accept socket_name") fails
+    loudly here instead of silently shipping.
     """
-    assert "display_message" in _BASE_INSTRUCTIONS
-    assert "snapshot_pane" in _BASE_INSTRUCTIONS
 
+    test_id: str
+    must_include: tuple[str, ...]
+    must_exclude: tuple[str, ...] = ()
+
+
+CARD_CONTRACTS: list[CardContract] = [
+    CardContract(
+        test_id="server_identity",
+        must_include=(
+            "tmux hierarchy",
+            "Server > Session > Window > Pane",
+            "pane_id",
+        ),
+    ),
+    CardContract(
+        test_id="socket_name_exception",
+        # ``list_servers`` does NOT accept socket_name (it's the discovery
+        # tool — see ``server_tools.py`` SOCKET_NAME_EXEMPT). The pre-refactor
+        # wording "All tools accept socket_name" was a lie; the new card
+        # qualifies "Targeted tools" and names list_servers explicitly.
+        must_include=("Targeted tools", "list_servers", "extra_socket_paths"),
+        must_exclude=("All tools accept",),
+    ),
+    CardContract(
+        test_id="three_handles",
+        # The card's job is to point at where the rest of the answer lives
+        # (tools / resources / prompts), not to inline tool-specific rules.
+        must_include=("Tools", "Resources (tmux://)", "Prompts"),
+    ),
+]
 
-def test_base_instructions_prefer_wait_over_poll() -> None:
-    """_BASE_INSTRUCTIONS names wait_for_text and wait_for_content_change.
 
-    The wait tools block server-side, which is dramatically cheaper in
-    agent turns than ``capture_pane`` in a retry loop. Making them
-    discoverable from the instructions is a no-cost UX win.
+@pytest.mark.parametrize(
+    CardContract._fields,
+    CARD_CONTRACTS,
+    ids=[c.test_id for c in CARD_CONTRACTS],
+)
+def test_card_contracts(
+    test_id: str,
+    must_include: tuple[str, ...],
+    must_exclude: tuple[str, ...],
+) -> None:
+    """``_BASE_INSTRUCTIONS`` is the slim "three handles" server card.
+
+    Tool-specific rules live in tool descriptions — Phase 1 of the
+    instructions slim-down moved them there. The card carries only
+    cross-cutting orientation: server identity, the socket_name
+    exception, and pointers to the Tools / Resources / Prompts handles.
+    Anything naming a specific tool's preference rule belongs at the
+    call site, not here.
     """
-    assert "wait_for_text" in _BASE_INSTRUCTIONS
-    assert "wait_for_content_change" in _BASE_INSTRUCTIONS
+    for needle in must_include:
+        assert needle in _BASE_INSTRUCTIONS, (
+            f"[{test_id}] missing required substring {needle!r}"
+        )
+    for needle in must_exclude:
+        assert needle not in _BASE_INSTRUCTIONS, (
+            f"[{test_id}] forbidden substring {needle!r} crept back in"
+        )
 
 
-def test_base_instructions_document_hook_boundary() -> None:
-    """_BASE_INSTRUCTIONS explains hooks are read-only by design.
+def test_card_length_budget() -> None:
+    """``_BASE_INSTRUCTIONS`` stays under the ~200-word budget.
 
-    Without this sentence agents waste a turn asking for ``set_hook`` or
-    trying to write hooks through a nonexistent tool. Naming the
-    boundary heads off the exploratory call.
+    Per-tool rules belong in tool descriptions (visible at every
+    ``list_tools`` call), not in this card. This guard fails loudly if a
+    future contributor reaches for the card to add a tool-specific rule,
+    pointing them at the right home before the card grows back into the
+    305-word monolith it just shrank from.
     """
-    assert "HOOKS ARE READ-ONLY" in _BASE_INSTRUCTIONS
-    assert "show_hooks" in _BASE_INSTRUCTIONS
-    assert "tmux config file" in _BASE_INSTRUCTIONS
-
-
-def test_base_instructions_document_socket_name_contract() -> None:
-    """_BASE_INSTRUCTIONS frames the socket_name promise precisely.
-
-    list_servers does NOT accept socket_name (it's the discovery tool —
-    see server_tools.py:263-264 where the signature is
-    ``list_servers(extra_socket_paths=...)``), so the previous "All
-    tools accept socket_name" wording was a lie. The instruction now
-    qualifies "Targeted tmux tools" and explicitly names list_servers
-    as the documented exception, matching what
-    test_registered_tools_accept_socket_name asserts at the schema
-    level.
-    """
-    assert "Targeted tmux tools accept" in _BASE_INSTRUCTIONS
-    assert "list_servers" in _BASE_INSTRUCTIONS
-    assert "extra_socket_paths" in _BASE_INSTRUCTIONS
+    word_count = len(_BASE_INSTRUCTIONS.split())
+    assert word_count <= 200, (
+        f"_BASE_INSTRUCTIONS grew to {word_count} words; per-tool rules "
+        f"belong in tool descriptions, not the card. See the module-level "
+        f"comment in server.py for the boundary."
+    )
 
 
 def test_registered_tools_accept_socket_name() -> None:
@@ -230,23 +258,6 @@ def test_registered_tools_accept_socket_name() -> None:
         )
 
 
-def test_base_instructions_document_buffer_lifecycle() -> None:
-    """_BASE_INSTRUCTIONS explains the buffer lifecycle + no list_buffers.
-
-    The load/paste/delete triple is non-obvious, and agents otherwise
-    expect a ``list_buffers`` affordance. The instruction prevents both
-    confusions and surfaces the clipboard-privacy reason so the
-    omission reads as deliberate, not missing.
-    """
-    assert "BUFFERS" in _BASE_INSTRUCTIONS
-    assert "load_buffer" in _BASE_INSTRUCTIONS
-    assert "paste_buffer" in _BASE_INSTRUCTIONS
-    assert "delete_buffer" in _BASE_INSTRUCTIONS
-    assert "BufferRef" in _BASE_INSTRUCTIONS
-    assert "list_buffers" in _BASE_INSTRUCTIONS
-    assert "clipboard history" in _BASE_INSTRUCTIONS
-
-
 @pytest.mark.parametrize(
     ("tool_name", "must_include"),
     [