Skip to content

Commit 72abdae

Browse files
committed
mcp(refactor[instructions]): collapse base instructions to a 117-word card
Phase 3 of the BASE_INSTRUCTIONS slim-down: delete the six gap-explainer / positive-guidance segments now that their content lives in tool descriptions (Phase 1). The card shrinks from a 305-word, 6-paragraph monolith to a 117-word "three handles" overview that answers (1) what is this server and (2) where do I look for the rest. Server.py changes ----------------- * Delete ``_INSTR_HIERARCHY``, ``_INSTR_METADATA_VS_CONTENT``, ``_INSTR_READ_TOOLS``, ``_INSTR_WAIT_NOT_POLL``, ``_INSTR_HOOKS_GAP``, ``_INSTR_BUFFERS_GAP``. * Add ``_INSTR_CARD`` (server identity, hierarchy, socket_name exception) and ``_INSTR_HANDLES`` (Tools / Resources / Prompts). * Rewrite the module-level comment to name the cross-cutting vs. tool-specific boundary explicitly, pointing future contributors at tool descriptions for any rule that names a specific tool. Live ``_build_instructions(mutating)`` output drops from 357 words to 215 — comfortably inside MCP's 150-300 word recommendation for global instructions. Tests ----- * Delete the six standalone ``test_base_instructions_*`` functions — every substring they pinned has either moved into a tool description (covered by Phase 1's ``test_tool_description_includes``) or been deliberately removed from the card. * Add ``CardContract`` ``NamedTuple`` + parametrized ``test_card_contracts`` with three rows: ``server_identity``, ``socket_name_exception`` (with ``must_exclude=("All tools accept",)`` guarding against drift back to the pre-refactor lie), and ``three_handles``. * Add ``test_card_length_budget`` enforcing a soft 200-word ceiling so the card cannot regress into the monolith it just shrank from. * The ``BuildInstructionsFixture`` matrix is unchanged — its assertions cover the dynamic safety / agent-context blocks of ``_build_instructions``, not ``_BASE_INSTRUCTIONS``, and those still build identically. Docs ---- ``docs/topics/prompting.md`` previously excerpted only the first two old segments; updated to mirror the new slim card so readers see what agents actually receive.
1 parent 5ef52c9 commit 72abdae

3 files changed

Lines changed: 125 additions & 159 deletions

File tree

docs/topics/prompting.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,22 @@ Every MCP client receives these instructions when connecting to the libtmux-mcp
1111
```{code-block} text
1212
:class: server-prompt
1313
14-
libtmux MCP server for programmatic tmux control. tmux hierarchy:
15-
Server > Session > Window > Pane. Use pane_id (e.g. '%1') as the
16-
preferred targeting method - it is globally unique within a tmux server.
17-
Use send_keys to execute commands and capture_pane to read output. All
18-
tools accept an optional socket_name parameter for multi-server support
19-
(defaults to LIBTMUX_SOCKET env var).
20-
21-
IMPORTANT — metadata vs content: list_windows, list_panes, and
22-
list_sessions only search metadata (names, IDs, current command). To
23-
find text that is actually visible in terminals — when users ask what
24-
panes 'contain', 'mention', 'show', or 'have' — use search_panes to
25-
search across all pane contents, or list_panes + capture_pane on each
26-
pane for manual inspection.
14+
libtmux MCP server: programmatic tmux control. tmux hierarchy is
15+
Server > Session > Window > Pane; every pane has a globally unique
16+
pane_id like %1 — prefer it over name/index for targeting. Targeted
17+
tools accept an optional socket_name (defaults to LIBTMUX_SOCKET);
18+
list_servers discovers sockets via TMUX_TMPDIR / extra_socket_paths
19+
and is the documented socket_name exception.
20+
21+
Three handles cover everything the agent needs:
22+
- Tools — call list_tools; per-tool descriptions tell you which to
23+
prefer (e.g. snapshot_pane over capture_pane + get_pane_info,
24+
wait_for_text over capture_pane in a retry loop, search_panes over
25+
list_panes when the user says "panes that contain X").
26+
- Resources (tmux://) — browseable hierarchy plus reference cards
27+
(format strings).
28+
- Prompts — packaged workflows: run_and_wait, diagnose_failing_pane,
29+
build_dev_workspace, interrupt_gracefully.
2730
```
2831

2932
The server also dynamically adds:

src/libtmux_mcp/server.py

Lines changed: 31 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -43,93 +43,45 @@
4343
_ServerCacheKey: t.TypeAlias = tuple[str | None, str | None, str | None]
4444

4545
# ---------------------------------------------------------------------------
46-
# _BASE_INSTRUCTIONS — composed from named segments.
46+
# _BASE_INSTRUCTIONS — slim "three handles" card.
4747
#
48-
# The string handed to FastMCP grew organically from "what does this server
49-
# do?" toward a hybrid of positive guidance (HIERARCHY, READ_TOOLS,
50-
# WAIT_NOT_POLL) and *gap-explainers* (HOOKS_GAP, BUFFERS_GAP) that document
51-
# why a tool the agent might expect is absent. Splitting into named
52-
# constants keeps additions deliberate: when a new ``_GAP`` segment feels
53-
# tempting, prefer first to push the explanation into the relevant tool's
54-
# docstring/description (where the agent encounters it at call time) and
55-
# only fall back to a server-level segment when the gap is *server-shaped*
56-
# (e.g. an entire tool family is intentionally missing).
48+
# The card answers two questions the agent has at session start:
49+
# (1) what is this server, and (2) where do I look for the rest? It points
50+
# at tools, resources, and prompts — and that's it. Tool-specific rules
51+
# (which tool to prefer, what's intentionally not exposed and why) live in
52+
# the relevant tool's docstring or ``description=`` override, where the
53+
# agent reads them on every ``list_tools`` call instead of parsing them
54+
# out of a one-shot prompt that has long since rolled out of context.
5755
#
58-
# Output text is byte-identical to the previous monolith; tests assert on
59-
# substrings of ``_BASE_INSTRUCTIONS``, so keeping the join shape stable
60-
# matters.
56+
# When in doubt about adding text here, ask: is this rule cross-cutting
57+
# (about the server as a whole) or tool-specific (about when to call X
58+
# vs Y)? Cross-cutting belongs in the card; tool-specific belongs in the
59+
# tool description. ``test_card_length_budget`` enforces a soft 200-word
60+
# ceiling against creeping re-bloat.
6161
# ---------------------------------------------------------------------------
6262

63-
_INSTR_HIERARCHY = (
64-
"libtmux MCP server for programmatic tmux control. "
65-
"tmux hierarchy: Server > Session > Window > Pane. "
66-
"Use pane_id (e.g. '%1') as the preferred targeting method - "
67-
"it is globally unique within a tmux server. "
68-
"Use send_keys to execute commands and capture_pane to read output. "
69-
"Targeted tmux tools accept an optional socket_name parameter "
70-
"(defaults to LIBTMUX_SOCKET env var); list_servers discovers "
71-
"sockets via TMUX_TMPDIR plus optional extra_socket_paths instead."
63+
_INSTR_CARD = (
64+
"libtmux MCP server: programmatic tmux control. tmux hierarchy is "
65+
"Server > Session > Window > Pane; every pane has a globally unique "
66+
"pane_id like %1 — prefer it over name/index for targeting. Targeted "
67+
"tools accept an optional socket_name (defaults to LIBTMUX_SOCKET); "
68+
"list_servers discovers sockets via TMUX_TMPDIR / extra_socket_paths "
69+
"and is the documented socket_name exception."
7270
)
7371

74-
_INSTR_METADATA_VS_CONTENT = (
75-
"IMPORTANT — metadata vs content: list_windows, list_panes, and "
76-
"list_sessions only search metadata (names, IDs, current command). "
77-
"To find text that is actually visible in terminals — when users ask "
78-
"what panes 'contain', 'mention', 'show', or 'have' — use "
79-
"search_panes to search across all pane contents, or list_panes + "
80-
"capture_pane on each pane for manual inspection."
72+
_INSTR_HANDLES = (
73+
"Three handles cover everything the agent needs:\n"
74+
"- Tools — call list_tools; per-tool descriptions tell you which to "
75+
"prefer (e.g. snapshot_pane over capture_pane + get_pane_info, "
76+
"wait_for_text over capture_pane in a retry loop, search_panes over "
77+
'list_panes when the user says "panes that contain X").\n'
78+
"- Resources (tmux://) — browseable hierarchy plus reference cards "
79+
"(format strings).\n"
80+
"- Prompts — packaged workflows: run_and_wait, diagnose_failing_pane, "
81+
"build_dev_workspace, interrupt_gracefully."
8182
)
8283

83-
_INSTR_READ_TOOLS = (
84-
"READ TOOLS TO PREFER: snapshot_pane returns pane content plus "
85-
"cursor position, mode, and scroll state in one call — use it "
86-
"instead of capture_pane + get_pane_info when you need context. "
87-
"display_message evaluates a tmux format string (e.g. "
88-
"'#{pane_current_command}', '#{session_name}') against a target "
89-
"and returns the expanded value — cheaper than parsing captured "
90-
"output. (The tool is named after the tmux 'display-message -p' "
91-
"verb it wraps; its MCP title is 'Evaluate tmux Format String'.)"
92-
)
93-
94-
_INSTR_WAIT_NOT_POLL = (
95-
"WAIT, DON'T POLL: for 'run command, wait for output' workflows "
96-
"use wait_for_text (matches text/regex on a pane) or "
97-
"wait_for_content_change (waits for any change). These block "
98-
"server-side until the condition is met or the timeout expires, "
99-
"which is dramatically cheaper in agent turns than capture_pane "
100-
"in a retry loop."
101-
)
102-
103-
#: Gap-explainer: write-hook tools are intentionally absent. See module
104-
#: comment above for when to add another ``_GAP`` segment vs. push the
105-
#: explanation into a tool description.
106-
_INSTR_HOOKS_GAP = (
107-
"HOOKS ARE READ-ONLY: inspect via show_hooks / show_hook. Write-hook "
108-
"tools are intentionally not exposed — tmux hooks survive process "
109-
"death, so they belong in your tmux config file, not a transient "
110-
"MCP session."
111-
)
112-
113-
#: Gap-explainer: ``list_buffers`` is intentionally absent because tmux
114-
#: buffers can include OS clipboard history. See module comment above.
115-
_INSTR_BUFFERS_GAP = (
116-
"BUFFERS: load_buffer stages content, paste_buffer delivers it into "
117-
"a pane, delete_buffer removes the staged buffer. Track owned "
118-
"buffers via the BufferRef returned from load_buffer — there is no "
119-
"list_buffers tool because tmux buffers may include OS clipboard "
120-
"history (passwords, private snippets)."
121-
)
122-
123-
_BASE_INSTRUCTIONS = "\n\n".join(
124-
(
125-
_INSTR_HIERARCHY,
126-
_INSTR_METADATA_VS_CONTENT,
127-
_INSTR_READ_TOOLS,
128-
_INSTR_WAIT_NOT_POLL,
129-
_INSTR_HOOKS_GAP,
130-
_INSTR_BUFFERS_GAP,
131-
)
132-
)
84+
_BASE_INSTRUCTIONS = "\n\n".join((_INSTR_CARD, _INSTR_HANDLES))
13385

13486

13587
def _build_instructions(safety_level: str = TAG_MUTATING) -> str:

tests/test_server.py

Lines changed: 78 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -129,65 +129,93 @@ def test_build_instructions(
129129
assert f"Safety level: {expect_safety_in_text}" in result
130130

131131

132-
def test_base_instructions_content() -> None:
133-
"""_BASE_INSTRUCTIONS contains key guidance for the LLM."""
134-
assert "tmux hierarchy" in _BASE_INSTRUCTIONS
135-
assert "pane_id" in _BASE_INSTRUCTIONS
136-
assert "search_panes" in _BASE_INSTRUCTIONS
137-
assert "metadata vs content" in _BASE_INSTRUCTIONS
138-
139-
140-
def test_base_instructions_surface_flagship_read_tools() -> None:
141-
"""_BASE_INSTRUCTIONS mentions the richer read tools by name.
142-
143-
``display_message`` (tmux format queries) and ``snapshot_pane``
144-
(content + metadata in one call) are strictly more expressive than
145-
``capture_pane`` for most contexts, but agents that never see them
146-
named in the instructions default to ``capture_pane`` + a follow-up
147-
``get_pane_info``. Naming both explicitly changes that default.
132+
class CardContract(t.NamedTuple):
133+
"""Contract about what ``_BASE_INSTRUCTIONS`` must / must not contain.
134+
135+
The slim card is the public-facing server prompt — every MCP client
136+
that connects gets it. ``must_include`` pins the substrings agents
137+
rely on to orient (server identity, socket_name exception, three
138+
handles); ``must_exclude`` pins the deleted-pre-refactor phrasing so
139+
a future drift back to the lie ("All tools accept socket_name") fails
140+
loudly here instead of silently shipping.
148141
"""
149-
assert "display_message" in _BASE_INSTRUCTIONS
150-
assert "snapshot_pane" in _BASE_INSTRUCTIONS
151142

143+
test_id: str
144+
must_include: tuple[str, ...]
145+
must_exclude: tuple[str, ...] = ()
146+
147+
148+
CARD_CONTRACTS: list[CardContract] = [
149+
CardContract(
150+
test_id="server_identity",
151+
must_include=(
152+
"tmux hierarchy",
153+
"Server > Session > Window > Pane",
154+
"pane_id",
155+
),
156+
),
157+
CardContract(
158+
test_id="socket_name_exception",
159+
# ``list_servers`` does NOT accept socket_name (it's the discovery
160+
# tool — see ``server_tools.py`` SOCKET_NAME_EXEMPT). The pre-refactor
161+
# wording "All tools accept socket_name" was a lie; the new card
162+
# qualifies "Targeted tools" and names list_servers explicitly.
163+
must_include=("Targeted tools", "list_servers", "extra_socket_paths"),
164+
must_exclude=("All tools accept",),
165+
),
166+
CardContract(
167+
test_id="three_handles",
168+
# The card's job is to point at where the rest of the answer lives
169+
# (tools / resources / prompts), not to inline tool-specific rules.
170+
must_include=("Tools", "Resources (tmux://)", "Prompts"),
171+
),
172+
]
152173

153-
def test_base_instructions_prefer_wait_over_poll() -> None:
154-
"""_BASE_INSTRUCTIONS names wait_for_text and wait_for_content_change.
155174

156-
The wait tools block server-side, which is dramatically cheaper in
157-
agent turns than ``capture_pane`` in a retry loop. Making them
158-
discoverable from the instructions is a no-cost UX win.
175+
@pytest.mark.parametrize(
176+
CardContract._fields,
177+
CARD_CONTRACTS,
178+
ids=[c.test_id for c in CARD_CONTRACTS],
179+
)
180+
def test_card_contracts(
181+
test_id: str,
182+
must_include: tuple[str, ...],
183+
must_exclude: tuple[str, ...],
184+
) -> None:
185+
"""``_BASE_INSTRUCTIONS`` is the slim "three handles" server card.
186+
187+
Tool-specific rules live in tool descriptions — Phase 1 of the
188+
instructions slim-down moved them there. The card carries only
189+
cross-cutting orientation: server identity, the socket_name
190+
exception, and pointers to the Tools / Resources / Prompts handles.
191+
Anything naming a specific tool's preference rule belongs at the
192+
call site, not here.
159193
"""
160-
assert "wait_for_text" in _BASE_INSTRUCTIONS
161-
assert "wait_for_content_change" in _BASE_INSTRUCTIONS
194+
for needle in must_include:
195+
assert needle in _BASE_INSTRUCTIONS, (
196+
f"[{test_id}] missing required substring {needle!r}"
197+
)
198+
for needle in must_exclude:
199+
assert needle not in _BASE_INSTRUCTIONS, (
200+
f"[{test_id}] forbidden substring {needle!r} crept back in"
201+
)
162202

163203

164-
def test_base_instructions_document_hook_boundary() -> None:
165-
"""_BASE_INSTRUCTIONS explains hooks are read-only by design.
204+
def test_card_length_budget() -> None:
205+
"""``_BASE_INSTRUCTIONS`` stays under the ~200-word budget.
166206
167-
Without this sentence agents waste a turn asking for ``set_hook`` or
168-
trying to write hooks through a nonexistent tool. Naming the
169-
boundary heads off the exploratory call.
207+
Per-tool rules belong in tool descriptions (visible at every
208+
``list_tools`` call), not in this card. This guard fails loudly if a
209+
future contributor reaches for the card to add a tool-specific rule,
210+
pointing them at the right home before the card grows back into the
211+
305-word monolith it just shrank from.
170212
"""
171-
assert "HOOKS ARE READ-ONLY" in _BASE_INSTRUCTIONS
172-
assert "show_hooks" in _BASE_INSTRUCTIONS
173-
assert "tmux config file" in _BASE_INSTRUCTIONS
174-
175-
176-
def test_base_instructions_document_socket_name_contract() -> None:
177-
"""_BASE_INSTRUCTIONS frames the socket_name promise precisely.
178-
179-
list_servers does NOT accept socket_name (it's the discovery tool —
180-
see server_tools.py:263-264 where the signature is
181-
``list_servers(extra_socket_paths=...)``), so the previous "All
182-
tools accept socket_name" wording was a lie. The instruction now
183-
qualifies "Targeted tmux tools" and explicitly names list_servers
184-
as the documented exception, matching what
185-
test_registered_tools_accept_socket_name asserts at the schema
186-
level.
187-
"""
188-
assert "Targeted tmux tools accept" in _BASE_INSTRUCTIONS
189-
assert "list_servers" in _BASE_INSTRUCTIONS
190-
assert "extra_socket_paths" in _BASE_INSTRUCTIONS
213+
word_count = len(_BASE_INSTRUCTIONS.split())
214+
assert word_count <= 200, (
215+
f"_BASE_INSTRUCTIONS grew to {word_count} words; per-tool rules "
216+
f"belong in tool descriptions, not the card. See the module-level "
217+
f"comment in server.py for the boundary."
218+
)
191219

192220

193221
def test_registered_tools_accept_socket_name() -> None:
@@ -230,23 +258,6 @@ def test_registered_tools_accept_socket_name() -> None:
230258
)
231259

232260

233-
def test_base_instructions_document_buffer_lifecycle() -> None:
234-
"""_BASE_INSTRUCTIONS explains the buffer lifecycle + no list_buffers.
235-
236-
The load/paste/delete triple is non-obvious, and agents otherwise
237-
expect a ``list_buffers`` affordance. The instruction prevents both
238-
confusions and surfaces the clipboard-privacy reason so the
239-
omission reads as deliberate, not missing.
240-
"""
241-
assert "BUFFERS" in _BASE_INSTRUCTIONS
242-
assert "load_buffer" in _BASE_INSTRUCTIONS
243-
assert "paste_buffer" in _BASE_INSTRUCTIONS
244-
assert "delete_buffer" in _BASE_INSTRUCTIONS
245-
assert "BufferRef" in _BASE_INSTRUCTIONS
246-
assert "list_buffers" in _BASE_INSTRUCTIONS
247-
assert "clipboard history" in _BASE_INSTRUCTIONS
248-
249-
250261
@pytest.mark.parametrize(
251262
("tool_name", "must_include"),
252263
[

0 commit comments

Comments
 (0)