feat(prompts): support inline string sources in EngineConfig.instructions#2311
feat(prompts): support inline string sources in EngineConfig.instructions#2311h3c-hexin wants to merge 1 commit into
Conversation
…ions
EngineConfig.instructions was Vec<PathBuf>, which forces embedders that
compute instructions at runtime to stage content to a disk file just to
satisfy the path API. That has two awkward side-effects:
1. The disk file looks like editable config but gets overwritten on
every launch, confusing for users browsing the install dir.
2. Multi-engine setups need per-engine paths to avoid `rehydrate`
re-reading the wrong session's content across concurrent engines.
Add an `InstructionSource` enum (`File(PathBuf)` / `Inline { name,
content }`) covering both shapes. `render_instructions_block` dispatches:
File sources read disk (original behavior preserved), Inline sources use
the content directly with `name` becoming the `<instructions source=...>`
attribute.
`From<PathBuf> for InstructionSource` is provided so the existing CLI /
runtime call sites upgrade with a single `.into_iter().map(Into::into).
collect()` chain — no behavior change for callers passing paths.
New test `render_instructions_block_handles_inline_source` covers Inline
empty / oversize-truncate / File+Inline mixed-ordering. The 5 existing
`render_instructions_block_*` tests are updated to use `.into()` on
`PathBuf` and continue to assert File-variant behavior.
There was a problem hiding this comment.
Code Review
This pull request generalizes the system prompt instructions from a list of file paths (Vec<PathBuf>) to a list of instruction sources (Vec<InstructionSource>), allowing embedders to inject inline instruction strings directly without staging them to disk. Feedback was provided regarding an optimization opportunity in render_instructions_block to use std::borrow::Cow to avoid cloning large inline instruction strings on every prompt render.
| let (raw_source_name, raw_content): (String, String) = match source { | ||
| InstructionSource::File(path) => match std::fs::read_to_string(path) { | ||
| Ok(raw) => (path.display().to_string(), raw), | ||
| Err(err) => { | ||
| tracing::warn!( | ||
| target: "instructions", | ||
| ?err, | ||
| ?path, | ||
| "skipping unreadable instructions file" | ||
| ); | ||
| continue; | ||
| } | ||
| let body = if trimmed.len() > INSTRUCTIONS_FILE_MAX_BYTES { | ||
| let head_end = (0..=INSTRUCTIONS_FILE_MAX_BYTES) | ||
| .rev() | ||
| .find(|&i| trimmed.is_char_boundary(i)) | ||
| .unwrap_or(0); | ||
| format!("{}\n[…elided]", &trimmed[..head_end]) | ||
| } else { | ||
| trimmed.to_string() | ||
| }; | ||
| sections.push(format!( | ||
| "<instructions source=\"{}\">\n{}\n</instructions>", | ||
| path.display(), | ||
| body | ||
| )); | ||
| } | ||
| Err(err) => { | ||
| tracing::warn!( | ||
| target: "instructions", | ||
| ?err, | ||
| ?path, | ||
| "skipping unreadable instructions file" | ||
| ); | ||
| } | ||
| }, | ||
| InstructionSource::Inline { name, content } => (name.clone(), content.clone()), | ||
| }; |
There was a problem hiding this comment.
In render_instructions_block, matching on source currently clones both name and content for InstructionSource::Inline variants. Since content can be up to 100 KiB (or even larger before truncation), cloning it on every prompt render introduces unnecessary memory allocations and CPU overhead.\n\nWe can avoid these allocations by using std::borrow::Cow to lazily borrow the inline name and content strings, and only allocate when reading from a file.
let (raw_source_name, raw_content): (std::borrow::Cow<'_, str>, std::borrow::Cow<'_, str>) = match source {
InstructionSource::File(path) => match std::fs::read_to_string(path) {
Ok(raw) => (std::borrow::Cow::Owned(path.display().to_string()), std::borrow::Cow::Owned(raw)),
Err(err) => {
tracing::warn!(
target: "instructions",
?err,
?path,
"skipping unreadable instructions file"
);
continue;
}
},
InstructionSource::Inline { name, content } => {
(std::borrow::Cow::Borrowed(name), std::borrow::Cow::Borrowed(content))
}
};| sections.push(format!( | ||
| "<instructions source=\"{raw_source_name}\">\n{body}\n</instructions>" | ||
| )); |
There was a problem hiding this comment.
Unescaped
name injected into XML attribute
The name field from InstructionSource::Inline is used verbatim as an XML attribute value without escaping. A name containing " (double-quote) breaks the attribute boundary — e.g., name = r#"foo" injected="bar" produces <instructions source="foo" injected="bar"> — and a name containing ">< can inject arbitrary sibling XML tags directly into the system prompt. Since the Inline variant is explicitly designed for embedder-provided strings, callers have no indication that name must be free of XML metacharacters. The content side is subject to the same risk (</instructions> in the content closes the tag early), but that pre-existed in the File path; the name injection is new surface area introduced by this PR. At minimum, " in name should be escaped to " before interpolation.
| // File + Inline 混用,顺序保持。 | ||
| let tmp = tempdir().expect("tempdir"); | ||
| let file_path = tmp.path().join("file-first.md"); | ||
| std::fs::write(&file_path, "FILE_MARKER").unwrap(); | ||
| let mixed = super::render_instructions_block(&[ | ||
| file_path.into(), | ||
| super::InstructionSource::Inline { | ||
| name: "inline-second".to_string(), | ||
| content: "INLINE_MARKER".to_string(), | ||
| }, | ||
| ]) | ||
| .expect("non-empty"); | ||
| let file_pos = mixed.find("FILE_MARKER").expect("file rendered"); | ||
| let inline_pos = mixed.find("INLINE_MARKER").expect("inline rendered"); | ||
| assert!(file_pos < inline_pos, "声明顺序必须保留(File then Inline)"); |
There was a problem hiding this comment.
The mixed-language comment and assertion message (Chinese inside an otherwise English test suite) will confuse contributors who don't read Chinese. The assertion message is especially important to keep in English since it surfaces in test failure output.
| // File + Inline 混用,顺序保持。 | |
| let tmp = tempdir().expect("tempdir"); | |
| let file_path = tmp.path().join("file-first.md"); | |
| std::fs::write(&file_path, "FILE_MARKER").unwrap(); | |
| let mixed = super::render_instructions_block(&[ | |
| file_path.into(), | |
| super::InstructionSource::Inline { | |
| name: "inline-second".to_string(), | |
| content: "INLINE_MARKER".to_string(), | |
| }, | |
| ]) | |
| .expect("non-empty"); | |
| let file_pos = mixed.find("FILE_MARKER").expect("file rendered"); | |
| let inline_pos = mixed.find("INLINE_MARKER").expect("inline rendered"); | |
| assert!(file_pos < inline_pos, "声明顺序必须保留(File then Inline)"); | |
| // File + Inline mixed: declaration order must be preserved. | |
| let tmp = tempdir().expect("tempdir"); | |
| let file_path = tmp.path().join("file-first.md"); | |
| std::fs::write(&file_path, "FILE_MARKER").unwrap(); | |
| let mixed = super::render_instructions_block(&[ | |
| file_path.into(), | |
| super::InstructionSource::Inline { | |
| name: "inline-second".to_string(), | |
| content: "INLINE_MARKER".to_string(), | |
| }, | |
| ]) | |
| .expect("non-empty"); | |
| let file_pos = mixed.find("FILE_MARKER").expect("file rendered"); | |
| let inline_pos = mixed.find("INLINE_MARKER").expect("inline rendered"); | |
| assert!(file_pos < inline_pos, "declaration order must be preserved (File then Inline)"); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
Adds a
pub enum InstructionSource { File(PathBuf), Inline { name, content } }toprompts.rsand generalizesEngineConfig.instructionsfromVec<PathBuf>toVec<InstructionSource>, so embedders can inject instructions either by path (current behavior) or as inline strings.Why
EngineConfig.instructions: Vec<PathBuf>forces embedders that compute instructions at runtime (e.g. rendering a template with workspace-specific substitutions) to stage content to a disk file just to satisfy the path API. That has two awkward side-effects:rehydrate_latest_canonical_statere-reads the disk path, so concurrent engines need per-engine paths to avoid leaking instructions across sessions. Inline sources live in the per-engineEngineConfigand the race surface disappears.Discovered while building a GUI embedder that injects per-session computed instructions and didn't want disk roundtrip.
What changes
prompts.rs: newpub enum InstructionSource+From<PathBuf>impl +render_instructions_blockdispatches on variant (File →fs::read_to_stringoriginal semantics, Inline → use content directly,namebecomes the<instructions source="…">attribute). Shared truncate / skip-empty logic.core/engine.rs:EngineConfig.instructions: Vec<PathBuf>→Vec<InstructionSource>.main.rsexec_agent +runtime_threads.rs+tui/ui.rs::build_engine_config) upgrade with.into_iter().map(Into::into).collect()— one-line change each, zero behavior change for path callers.Testing
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-featurescargo test --workspace --all-featuresTest results: 3449 passed / 1 failed / 4 ignored.
render_instructions_block_*tests updated to use.into()onPathBuf— still assert File-variant behavior (missing-file skip + warn, oversize truncate with[…elided], declaration order, empty-content skip).render_instructions_block_handles_inline_sourcecovers Inline empty / oversize / File+Inline mixed-order.runtime_log.rs+main.rsare pre-existing inmain, not introduced by this PR.system_prompt_skips_locale_preamble_for_english) is local environment noise (sensitive to whether~/.agents/skills/has non-English skill content during the test) — also fails onmainin the same environment, should pass in CI.Checklist
InstructionSourceenum variants + the field onEngineConfig)Greptile Summary
This PR generalizes
EngineConfig.instructionsfromVec<PathBuf>toVec<InstructionSource>, adding a newInline { name, content }variant so embedders can inject runtime-computed instructions without staging content to disk. The three existing call sites are migrated with a single.map(Into::into)each, preserving all prior file-path semantics.prompts.rs: NewInstructionSourceenum withFile(PathBuf)andInline { name, content }variants,From<PathBuf>conversions, and a refactoredrender_instructions_blockthat dispatches on the variant — shared truncation and empty-skip logic applies to both.engine.rs: Field type updated and doc comment revised; no logic change.main.rs,runtime_threads.rs,tui/ui.rs): Each does a one-linerinto_iter().map(Into::into).collect()— zero behavior change for path-based callers.Confidence Score: 3/5
The change is backward-compatible and the migration at existing call sites is mechanical and correct, but the new Inline variant has an unguarded XML attribute injection point that could silently corrupt system prompts for embedders who supply a name containing double-quotes or angle brackets.
The core logic refactor is clean and well-tested. The unguarded XML attribute injection in the new Inline variant is the key concern — an embedder supplying a name with a double-quote or angle brackets will silently corrupt the system prompt structure with no error or warning.
crates/tui/src/prompts.rs — specifically the render_instructions_block format string where raw_source_name is used as an XML attribute value without escaping.
Security Review
prompts.rs,render_instructions_block): Thenamefield ofInstructionSource::Inlineis interpolated directly intosource="..."without escaping. A name containing"breaks the attribute boundary; a name containing"><can inject sibling XML tags into the system prompt. TheInlinevariant is the new attack surface — theFilepath had the same structural issue forpath.display()but file paths rarely contain XML metacharacters, whereas the inline API is explicitly designed to accept arbitrary embedder-supplied strings.Important Files Changed
InstructionSourceenum and refactorsrender_instructions_blockto handle File/Inline variants; thenamefield ofInlineis interpolated into an XML attribute without escaping, enabling attribute injection into the system prompt.EngineConfig.instructionstype fromVec<PathBuf>toVec<InstructionSource>with updated doc comment; mechanical type change, no logic issues.Vec<PathBuf>toVec<InstructionSource>viaInto::into; zero behavior change for the existing file-path flow.main.rs; behavior unchanged for existing path-based callers.build_engine_config; no behavioral change.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[EngineConfig.instructions Vec<InstructionSource>] --> B{InstructionSource variant} B -- File(PathBuf) --> C[fs::read_to_string] C -- Ok --> D[raw_content] C -- Err --> E[tracing::warn + skip] B -- "Inline { name, content }" --> D D --> F{trimmed.is_empty?} F -- yes --> G[skip] F -- no --> H{len > MAX_BYTES?} H -- yes --> I[truncate + append elided marker] H -- no --> J[use as-is] I --> K["format XML block"] J --> K K --> L[sections.join] L --> M[system prompt]Reviews (1): Last reviewed commit: "feat(prompts): support inline string sou..." | Re-trigger Greptile