Skip to content

Commit f309ff8

Browse files
committed
Stop repo lanes from executing the wrong task payload
The next repo-local sweep target was ROADMAP ultraworkers#71: a claw-code lane accepted an unrelated KakaoTalk/image-analysis prompt even though the lane itself was supposed to be repo-scoped work. This extends the existing prompt-misdelivery guardrail with an optional structured task receipt so worker boot can reject visible wrong-task context before the lane continues executing. Constraint: Keep the fix inside the existing worker_boot / WorkerSendPrompt control surface instead of inventing a new external OMX-only protocol Rejected: Treat wrong-task receipts as generic shell misdelivery | loses the expected-vs-observed task context needed to debug contaminated lanes Confidence: high Scope-risk: narrow Reversibility: clean Directive: If task-receipt fields change later, update the WorkerSendPrompt schema, worker payload serialization, and wrong-task regression together Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE Not-tested: External orchestrators that have not yet started populating the optional task_receipt field
1 parent 3b80670 commit f309ff8

4 files changed

Lines changed: 195 additions & 12 deletions

File tree

ROADMAP.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -513,3 +513,5 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
513513
69. **Lane stop summaries have no minimum quality floor****done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now normalizes vague/control-only stop summaries into a contextual fallback that includes the lane target and status, while preserving structured metadata about whether the quality floor fired (`qualityFloorApplied`, `rawSummary`, `reasons`, `wordCount`). Regression coverage locks both the pass-through path for good summaries and the fallback path for mushy summaries like `commit push everyting, keep sweeping $ralph`. **Original filing below.**
514514

515515
70. **Install-source ambiguity misleads real users****done (verified 2026-04-12):** repo-local Rust guidance now makes the source of truth explicit in `claw doctor` and `claw --help`, naming `ultraworkers/claw-code` as the canonical repo and warning that `cargo install claw-code` installs a deprecated stub rather than the `claw` binary. Regression coverage locks both the new doctor JSON check and the help-text warning. **Original filing below.**
516+
517+
71. **Wrong-task prompt receipt is not detected before execution****done (verified 2026-04-12):** worker boot prompt dispatch now accepts an optional structured `task_receipt` (`repo`, `task_kind`, `source_surface`, `expected_artifacts`, `objective_preview`) and treats mismatched visible prompt context as a `WrongTask` prompt-delivery failure before execution continues. The prompt-delivery payload now records `observed_prompt_preview` plus the expected receipt, and regression coverage locks both the existing shell/wrong-target paths and the new KakaoTalk-style wrong-task mismatch case. **Original filing below.**

rust/crates/runtime/src/worker_boot.rs

Lines changed: 168 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ pub enum WorkerTrustResolution {
9292
pub enum WorkerPromptTarget {
9393
Shell,
9494
WrongTarget,
95+
WrongTask,
9596
Unknown,
9697
}
9798

@@ -108,10 +109,24 @@ pub enum WorkerEventPayload {
108109
observed_target: WorkerPromptTarget,
109110
#[serde(skip_serializing_if = "Option::is_none")]
110111
observed_cwd: Option<String>,
112+
#[serde(skip_serializing_if = "Option::is_none")]
113+
observed_prompt_preview: Option<String>,
114+
#[serde(skip_serializing_if = "Option::is_none")]
115+
task_receipt: Option<WorkerTaskReceipt>,
111116
recovery_armed: bool,
112117
},
113118
}
114119

120+
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
121+
pub struct WorkerTaskReceipt {
122+
pub repo: String,
123+
pub task_kind: String,
124+
pub source_surface: String,
125+
#[serde(default, skip_serializing_if = "Vec::is_empty")]
126+
pub expected_artifacts: Vec<String>,
127+
pub objective_preview: String,
128+
}
129+
115130
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
116131
pub struct WorkerEvent {
117132
pub seq: u64,
@@ -134,6 +149,7 @@ pub struct Worker {
134149
pub prompt_delivery_attempts: u32,
135150
pub prompt_in_flight: bool,
136151
pub last_prompt: Option<String>,
152+
pub expected_receipt: Option<WorkerTaskReceipt>,
137153
pub replay_prompt: Option<String>,
138154
pub last_error: Option<WorkerFailure>,
139155
pub created_at: u64,
@@ -182,6 +198,7 @@ impl WorkerRegistry {
182198
prompt_delivery_attempts: 0,
183199
prompt_in_flight: false,
184200
last_prompt: None,
201+
expected_receipt: None,
185202
replay_prompt: None,
186203
last_error: None,
187204
created_at: ts,
@@ -257,6 +274,7 @@ impl WorkerRegistry {
257274
&lowered,
258275
worker.last_prompt.as_deref(),
259276
&worker.cwd,
277+
worker.expected_receipt.as_ref(),
260278
)
261279
})
262280
.flatten()
@@ -272,6 +290,10 @@ impl WorkerRegistry {
272290
"worker prompt landed in the wrong target instead of {}: {}",
273291
worker.cwd, prompt_preview
274292
),
293+
WorkerPromptTarget::WrongTask => format!(
294+
"worker prompt receipt mismatched the expected task context for {}: {}",
295+
worker.cwd, prompt_preview
296+
),
275297
WorkerPromptTarget::Unknown => format!(
276298
"worker prompt delivery failed before reaching coding agent: {prompt_preview}"
277299
),
@@ -291,6 +313,8 @@ impl WorkerRegistry {
291313
prompt_preview: prompt_preview.clone(),
292314
observed_target: observation.target,
293315
observed_cwd: observation.observed_cwd.clone(),
316+
observed_prompt_preview: observation.observed_prompt_preview.clone(),
317+
task_receipt: worker.expected_receipt.clone(),
294318
recovery_armed: false,
295319
}),
296320
);
@@ -306,6 +330,8 @@ impl WorkerRegistry {
306330
prompt_preview,
307331
observed_target: observation.target,
308332
observed_cwd: observation.observed_cwd,
333+
observed_prompt_preview: observation.observed_prompt_preview,
334+
task_receipt: worker.expected_receipt.clone(),
309335
recovery_armed: true,
310336
}),
311337
);
@@ -374,7 +400,12 @@ impl WorkerRegistry {
374400
Ok(worker.clone())
375401
}
376402

377-
pub fn send_prompt(&self, worker_id: &str, prompt: Option<&str>) -> Result<Worker, String> {
403+
pub fn send_prompt(
404+
&self,
405+
worker_id: &str,
406+
prompt: Option<&str>,
407+
task_receipt: Option<WorkerTaskReceipt>,
408+
) -> Result<Worker, String> {
378409
let mut inner = self.inner.lock().expect("worker registry lock poisoned");
379410
let worker = inner
380411
.workers
@@ -398,6 +429,7 @@ impl WorkerRegistry {
398429
worker.prompt_delivery_attempts += 1;
399430
worker.prompt_in_flight = true;
400431
worker.last_prompt = Some(next_prompt.clone());
432+
worker.expected_receipt = task_receipt;
401433
worker.replay_prompt = None;
402434
worker.last_error = None;
403435
worker.status = WorkerStatus::Running;
@@ -548,6 +580,7 @@ fn prompt_misdelivery_is_relevant(worker: &Worker) -> bool {
548580
struct PromptDeliveryObservation {
549581
target: WorkerPromptTarget,
550582
observed_cwd: Option<String>,
583+
observed_prompt_preview: Option<String>,
551584
}
552585

553586
fn push_event(
@@ -699,6 +732,7 @@ fn detect_prompt_misdelivery(
699732
lowered: &str,
700733
prompt: Option<&str>,
701734
expected_cwd: &str,
735+
expected_receipt: Option<&WorkerTaskReceipt>,
702736
) -> Option<PromptDeliveryObservation> {
703737
let Some(prompt) = prompt else {
704738
return None;
@@ -713,12 +747,30 @@ fn detect_prompt_misdelivery(
713747
return None;
714748
}
715749
let prompt_visible = lowered.contains(&prompt_snippet);
750+
let observed_prompt_preview = detect_prompt_echo(screen_text);
751+
752+
if let Some(receipt) = expected_receipt {
753+
let receipt_visible = task_receipt_visible(lowered, receipt);
754+
let mismatched_prompt_visible = observed_prompt_preview
755+
.as_deref()
756+
.map(str::to_ascii_lowercase)
757+
.is_some_and(|preview| !preview.contains(&prompt_snippet));
758+
759+
if (prompt_visible || mismatched_prompt_visible) && !receipt_visible {
760+
return Some(PromptDeliveryObservation {
761+
target: WorkerPromptTarget::WrongTask,
762+
observed_cwd: detect_observed_shell_cwd(screen_text),
763+
observed_prompt_preview,
764+
});
765+
}
766+
}
716767

717768
if let Some(observed_cwd) = detect_observed_shell_cwd(screen_text) {
718769
if prompt_visible && !cwd_matches_observed_target(expected_cwd, &observed_cwd) {
719770
return Some(PromptDeliveryObservation {
720771
target: WorkerPromptTarget::WrongTarget,
721772
observed_cwd: Some(observed_cwd),
773+
observed_prompt_preview,
722774
});
723775
}
724776
}
@@ -736,6 +788,7 @@ fn detect_prompt_misdelivery(
736788
(shell_error && prompt_visible).then_some(PromptDeliveryObservation {
737789
target: WorkerPromptTarget::Shell,
738790
observed_cwd: None,
791+
observed_prompt_preview,
739792
})
740793
}
741794

@@ -748,10 +801,38 @@ fn prompt_preview(prompt: &str) -> String {
748801
format!("{}…", preview.trim_end())
749802
}
750803

804+
fn detect_prompt_echo(screen_text: &str) -> Option<String> {
805+
screen_text.lines().find_map(|line| {
806+
line.trim_start()
807+
.strip_prefix('›')
808+
.map(str::trim)
809+
.filter(|value| !value.is_empty())
810+
.map(str::to_string)
811+
})
812+
}
813+
814+
fn task_receipt_visible(lowered_screen_text: &str, receipt: &WorkerTaskReceipt) -> bool {
815+
let expected_tokens = [
816+
receipt.repo.to_ascii_lowercase(),
817+
receipt.task_kind.to_ascii_lowercase(),
818+
receipt.source_surface.to_ascii_lowercase(),
819+
receipt.objective_preview.to_ascii_lowercase(),
820+
];
821+
822+
expected_tokens
823+
.iter()
824+
.all(|token| lowered_screen_text.contains(token))
825+
&& receipt
826+
.expected_artifacts
827+
.iter()
828+
.all(|artifact| lowered_screen_text.contains(&artifact.to_ascii_lowercase()))
829+
}
830+
751831
fn prompt_misdelivery_detail(observation: &PromptDeliveryObservation) -> &'static str {
752832
match observation.target {
753833
WorkerPromptTarget::Shell => "shell misdelivery detected",
754834
WorkerPromptTarget::WrongTarget => "prompt landed in wrong target",
835+
WorkerPromptTarget::WrongTask => "prompt receipt mismatched expected task context",
755836
WorkerPromptTarget::Unknown => "prompt delivery failure detected",
756837
}
757838
}
@@ -865,7 +946,7 @@ mod tests {
865946
WorkerFailureKind::TrustGate
866947
);
867948

868-
let send_before_resolve = registry.send_prompt(&worker.worker_id, Some("ship it"));
949+
let send_before_resolve = registry.send_prompt(&worker.worker_id, Some("ship it"), None);
869950
assert!(send_before_resolve
870951
.expect_err("prompt delivery should be gated")
871952
.contains("not ready for prompt delivery"));
@@ -905,7 +986,7 @@ mod tests {
905986
.expect("ready observe should succeed");
906987

907988
let running = registry
908-
.send_prompt(&worker.worker_id, Some("Implement worker handshake"))
989+
.send_prompt(&worker.worker_id, Some("Implement worker handshake"), None)
909990
.expect("prompt send should succeed");
910991
assert_eq!(running.status, WorkerStatus::Running);
911992
assert_eq!(running.prompt_delivery_attempts, 1);
@@ -941,6 +1022,8 @@ mod tests {
9411022
prompt_preview: "Implement worker handshake".to_string(),
9421023
observed_target: WorkerPromptTarget::Shell,
9431024
observed_cwd: None,
1025+
observed_prompt_preview: None,
1026+
task_receipt: None,
9441027
recovery_armed: false,
9451028
})
9461029
);
@@ -956,12 +1039,14 @@ mod tests {
9561039
prompt_preview: "Implement worker handshake".to_string(),
9571040
observed_target: WorkerPromptTarget::Shell,
9581041
observed_cwd: None,
1042+
observed_prompt_preview: None,
1043+
task_receipt: None,
9591044
recovery_armed: true,
9601045
})
9611046
);
9621047

9631048
let replayed = registry
964-
.send_prompt(&worker.worker_id, None)
1049+
.send_prompt(&worker.worker_id, None, None)
9651050
.expect("replay send should succeed");
9661051
assert_eq!(replayed.status, WorkerStatus::Running);
9671052
assert!(replayed.replay_prompt.is_none());
@@ -976,7 +1061,11 @@ mod tests {
9761061
.observe(&worker.worker_id, "Ready for input\n>")
9771062
.expect("ready observe should succeed");
9781063
registry
979-
.send_prompt(&worker.worker_id, Some("Run the worker bootstrap tests"))
1064+
.send_prompt(
1065+
&worker.worker_id,
1066+
Some("Run the worker bootstrap tests"),
1067+
None,
1068+
)
9801069
.expect("prompt send should succeed");
9811070

9821071
let recovered = registry
@@ -1007,6 +1096,8 @@ mod tests {
10071096
prompt_preview: "Run the worker bootstrap tests".to_string(),
10081097
observed_target: WorkerPromptTarget::WrongTarget,
10091098
observed_cwd: Some("/tmp/repo-target-b".to_string()),
1099+
observed_prompt_preview: None,
1100+
task_receipt: None,
10101101
recovery_armed: false,
10111102
})
10121103
);
@@ -1049,6 +1140,75 @@ mod tests {
10491140
assert!(ready.last_error.is_none());
10501141
}
10511142

1143+
#[test]
1144+
fn wrong_task_receipt_mismatch_is_detected_before_execution_continues() {
1145+
let registry = WorkerRegistry::new();
1146+
let worker = registry.create("/tmp/repo-task", &[], true);
1147+
registry
1148+
.observe(&worker.worker_id, "Ready for input\n>")
1149+
.expect("ready observe should succeed");
1150+
registry
1151+
.send_prompt(
1152+
&worker.worker_id,
1153+
Some("Implement worker handshake"),
1154+
Some(WorkerTaskReceipt {
1155+
repo: "claw-code".to_string(),
1156+
task_kind: "repo_code".to_string(),
1157+
source_surface: "omx_team".to_string(),
1158+
expected_artifacts: vec!["patch".to_string(), "tests".to_string()],
1159+
objective_preview: "Implement worker handshake".to_string(),
1160+
}),
1161+
)
1162+
.expect("prompt send should succeed");
1163+
1164+
let recovered = registry
1165+
.observe(
1166+
&worker.worker_id,
1167+
"› Explain this KakaoTalk screenshot for a friend\nI can help analyze the screenshot…",
1168+
)
1169+
.expect("mismatch observe should succeed");
1170+
1171+
assert_eq!(recovered.status, WorkerStatus::ReadyForPrompt);
1172+
assert_eq!(
1173+
recovered
1174+
.last_error
1175+
.expect("mismatch error should exist")
1176+
.kind,
1177+
WorkerFailureKind::PromptDelivery
1178+
);
1179+
let mismatch = recovered
1180+
.events
1181+
.iter()
1182+
.find(|event| event.kind == WorkerEventKind::PromptMisdelivery)
1183+
.expect("wrong-task event should exist");
1184+
assert_eq!(mismatch.status, WorkerStatus::Failed);
1185+
assert_eq!(
1186+
mismatch.payload,
1187+
Some(WorkerEventPayload::PromptDelivery {
1188+
prompt_preview: "Implement worker handshake".to_string(),
1189+
observed_target: WorkerPromptTarget::WrongTask,
1190+
observed_cwd: None,
1191+
observed_prompt_preview: Some(
1192+
"Explain this KakaoTalk screenshot for a friend".to_string()
1193+
),
1194+
task_receipt: Some(WorkerTaskReceipt {
1195+
repo: "claw-code".to_string(),
1196+
task_kind: "repo_code".to_string(),
1197+
source_surface: "omx_team".to_string(),
1198+
expected_artifacts: vec!["patch".to_string(), "tests".to_string()],
1199+
objective_preview: "Implement worker handshake".to_string(),
1200+
}),
1201+
recovery_armed: false,
1202+
})
1203+
);
1204+
let replay = recovered
1205+
.events
1206+
.iter()
1207+
.find(|event| event.kind == WorkerEventKind::PromptReplayArmed)
1208+
.expect("replay event should exist");
1209+
assert_eq!(replay.status, WorkerStatus::ReadyForPrompt);
1210+
}
1211+
10521212
#[test]
10531213
fn restart_and_terminate_reset_or_finish_worker() {
10541214
let registry = WorkerRegistry::new();
@@ -1057,7 +1217,7 @@ mod tests {
10571217
.observe(&worker.worker_id, "Ready for input\n>")
10581218
.expect("ready observe should succeed");
10591219
registry
1060-
.send_prompt(&worker.worker_id, Some("Run tests"))
1220+
.send_prompt(&worker.worker_id, Some("Run tests"), None)
10611221
.expect("prompt send should succeed");
10621222

10631223
let restarted = registry
@@ -1086,7 +1246,7 @@ mod tests {
10861246
.observe(&worker.worker_id, "Ready for input\n>")
10871247
.expect("ready observe should succeed");
10881248
registry
1089-
.send_prompt(&worker.worker_id, Some("Run tests"))
1249+
.send_prompt(&worker.worker_id, Some("Run tests"), None)
10901250
.expect("prompt send should succeed");
10911251

10921252
let failed = registry
@@ -1163,7 +1323,7 @@ mod tests {
11631323
.observe(&worker.worker_id, "Ready for input\n>")
11641324
.expect("ready observe should succeed");
11651325
registry
1166-
.send_prompt(&worker.worker_id, Some("Run tests"))
1326+
.send_prompt(&worker.worker_id, Some("Run tests"), None)
11671327
.expect("prompt send should succeed");
11681328

11691329
let finished = registry

rust/crates/runtime/tests/integration_tests.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ fn worker_provider_failure_flows_through_recovery_to_policy() {
304304
.observe(&worker.worker_id, "Ready for your input\n>")
305305
.expect("ready observe should succeed");
306306
registry
307-
.send_prompt(&worker.worker_id, Some("Run analysis"))
307+
.send_prompt(&worker.worker_id, Some("Run analysis"), None)
308308
.expect("prompt send should succeed");
309309

310310
// Session completes with provider failure (finish="unknown", tokens=0)

0 commit comments

Comments
 (0)