fix(validate): error when verifier S3 solutions path doesn't match task key by mikesklar · Pull Request #75 · fleet-ai/fleet-sdk

mikesklar · 2026-03-03T21:27:43Z

Summary

Verifiers using Image.s3() embed the task key as a path segment in S3 URLs (e.g. .../<TASK_KEY>/solutions/gold_plot.png)
When uploading under a new key, forgetting to update these paths causes the verifier to silently load the wrong gold-reference images
The validator now checks that the expected TASK_KEY appears as a path segment in every S3 URL containing /solutions/

Test plan

Mismatched key → errors with clear message pointing at the stale URL
Matching key → passes cleanly

🤖 Generated with Claude Code

When verifier code contains multiple functions (e.g., a main verifier function and helper functions), the helper functions were not accessible from the main function due to namespace isolation. The exec() call created functions in local_namespace, but the main function's __globals__ pointed to exec_globals which didn't contain the helper functions. This caused NameError when the main function tried to call helpers, which was silently caught and returned 0.0. Fix: Merge local_namespace into exec_globals after exec() so all defined functions are accessible when the verifier is called.

…mespace fix: allow verifier helper functions to be called from main verifier

InstanceRequest changes: - Add: profile_id, async_provision, instance_mode, ssh_public_keys, snapshot_interval_minutes, version (deprecated) - Fix: region default changed from 'us-west-1' to None (server decides) - Fix: created_from default changed from None to 'api' TaskRequest changes: - Add: verifier_func, project_key, data_id, data_version, writer_metadata - Add: model_config with extra='ignore' and populate_by_name=True - Add: alias='env_id' for environment_id field - Remove: metadata (doesn't exist in orchestrator TaskRequest, only in TaskResponse)

…API" This reverts commit 9a0af14.

add metadata to tasks in SDK

bump version

consolidate

…odels Add factual_answer field to support research/factual tasks: - Task model: stores expected answer for verification - TaskRequest: accept factual_answer when creating tasks - TaskResponse: return factual_answer from API Part of: https://linear.app/fleet-ai/issue/ENG-843/import-script-needs-to-support-output-json-schemas Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

feat: add factual_answer field to Task and API models

Add task_modality field to Task and TaskResponse models to support copying task modality (computer_use, tool_use, browser) when importing tasks via the SDK. Changes: - Add task_modality to TaskResponse model (API response) - Add task_modality to Task model (SDK model) - Pass task_modality from TaskResponse to Task in load_tasks Co-authored-by: Cursor <cursoragent@cursor.com>

Addresses Bugbot comment: load_task_from_json wasn't extracting task_modality from JSON data, causing tasks loaded from JSON files to have task_modality=None even when the JSON contains this field. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>

The task_lifecycle_status field was added to the Task model but was missing from: - TaskResponse model (sync and async) - needed to parse API response - load_tasks method - needed to pass the field to Task constructor This completes the task_lifecycle_status support in the SDK. Co-authored-by: Cursor <cursoragent@cursor.com>

The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored fallback for env_key in load_task_from_json to support JSON files that use env_key field. Co-authored-by: Cursor <cursoragent@cursor.com>

The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored env_id fallback in load_task_from_json for backward compatibility with existing JSON files. Co-authored-by: Cursor <cursoragent@cursor.com>

The make() method was using self.env_key (raw field) instead of self.get_env_key() (computed method with version). This would cause environments to be created without the version suffix. Co-authored-by: Cursor <cursoragent@cursor.com>

The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>

When export_tasks serializes tasks, it outputs env_key. The loading function needs to check for env_key first (canonical name), then fallback to environment_id (API) and env_id (legacy). Co-authored-by: Cursor <cursoragent@cursor.com>

- TaskResponse: rename environment_id -> env_key (alias="environment_id") - TaskRequest: rename environment_id -> env_key (alias="environment_id") - Add ConfigDict(populate_by_name=True) for alias support - Add Task.env_spec property for env_key:version string - Use task.env_spec in Task.make() and make_for_task() - Clean up load_tasks to use task_response.env_key directly - Remove scattered inline env_key:version string building Co-authored-by: Cursor <cursoragent@cursor.com>

- data_spec: renamed from data_key (data_key kept as alias) - has_verifier: whether task has verifier_func or verifier - is_research_based: whether task has a factual_answer - is_action_based: inverse of is_research_based Co-authored-by: Cursor <cursoragent@cursor.com>

TaskInfo has alias="env_id" on env_key field but was missing model_config = ConfigDict(populate_by_name=True). Without this, creating TaskInfo(env_key="...") would fail since only the alias name was accepted. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

feat: Add task_lifecycle_status field to Task model

The PUT /v1/tasks/{task_key} endpoint can return environment_id: null, which caused a Pydantic validation error since env_key was required. This made update_task crash instead of returning a TaskResponse. - TaskResponse.env_key: str -> Optional[str] - Task.env_key: str -> Optional[str] - Task.env_spec now returns None when env_key is absent Co-authored-by: Cursor <cursoragent@cursor.com>

When a task has env_key=None, make_for_task would pass None to make() causing a TypeError at ":" in env_key. Now raises a clear ValueError matching the guard in Task.make(). Co-authored-by: Cursor <cursoragent@cursor.com>

fix: make TaskResponse.env_key optional to handle null API responses

…al-env-key" This reverts commit 3a4f711, reversing changes made to 7ec526b.

…v-key revert: restore env_key as required in TaskResponse and Task

The SDK now correctly imports output_json_schema automatically via the API, so the manual-copy warning is no longer accurate. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: remove stale output_json_schema warning from import_tasks

Simple scripts for task authors to download existing tasks, edit them locally, and upload as new tasks. Uses raw requests (no SDK dependency) with auto-resolved team ID from API key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add task bundle editing scripts

* fix(download): don't pass auto-resolved team_id to task GET The team_id query param on GET /v1/tasks/{key} requires admin privileges. Previously the script always passed it (from auto-resolve), causing 403 errors for non-admin API keys. Now only passes team_id when explicitly provided via --team-id flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip resolve_team_id when --team-id is explicitly provided Avoids an unnecessary API call that could fail and block the download. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

) * feat(upload): add job launching and auto-generated unique task keys - Make --key optional; auto-generates {original_key}_{uuid[:8]} when omitted - Replace local key comparison with server-side existence check (GET /v1/tasks/{key}) - Launch job by default after upload (POST /v1/jobs) with --no-launch-job to skip - Add --models, --pass-k flags for job configuration - Default models: gemini-3.1-pro-preview, claude-opus-4.6, gpt-5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(upload): raise on unexpected status in task existence check Previously any non-200 (including 500, 403, 429) was treated as "key available", silently skipping the guard. Now only 404 means available; other errors are surfaced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

The file-set key may differ from the task key (e.g., without a version suffix). Pull the key from the task's env_variables.TASK_KEY when available, falling back to the CLI --task-key argument. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…validation (#73) The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(upload): extract job_id from API response and add workspace file validation The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add standalone launch_job script for existing tasks Provides a simple way to launch jobs for tasks that already exist on the server, without the upload/create flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…sk key Verifiers that use Image.s3() to load gold-reference images embed the task key as a path segment. When uploading under a new key, forgetting to update these paths causes the verifier to silently load the wrong solutions. The validator now checks that the expected TASK_KEY appears as a path segment in every S3 URL containing /solutions/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Drop fallback chain — only check S3 solutions paths against the TASK_KEY env variable, which is the actual file-set key used in S3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-03T21:32:36Z

+                            f"Verifier S3 solutions path does not contain "
+                            f"expected key '{expected_key}' as a path segment: "
+                            f"{url}"
+                        )


S3 path check ignores new_key argument

High Severity

The new S3 solutions-path validation uses task_key_var (the TASK_KEY from env_variables) as expected_key, but ignores new_key when it is supplied via --new-key. This means that when a user uploads under a new key and forgets to update the verifier's S3 URLs, both task_key_var and the stale URLs still contain the old key — the check finds them "consistent" and emits no error. The exact bug the PR is designed to catch is silently missed. expected_key should be new_key or task_key_var so that the check validates URLs against the intended upload key when one is provided.

mikesklar and others added 30 commits January 13, 2026 12:21

Update agent.py

b301d67

update gemini cua agent with latest updates

32fa85f

update name

8852db9

Merge pull request #38 from fleet-ai/fix/verifier-helper-functions-na…

5f89234

…mespace fix: allow verifier helper functions to be called from main verifier

Bump version to 0.2.104

54feffd

add metadata to tasks

f895cb6

fixes

fef862d

Revert "fix: align InstanceRequest and TaskRequest with orchestrator …

9cdd7e6

…API" This reverts commit 9a0af14.

Merge pull request #40 from fleet-ai/zz/add-metadata-0121

acc58ed

add metadata to tasks in SDK

bump version

58f8faf

Merge pull request #41 from fleet-ai/zz/2.105

5dbb76e

bump version

consolidate

ee6a8a5

Consolidate all metadata into "metadata" in TaskResponse

05112f7

consolidate

README.md

afb516c

README.md

c08b76b

Update README.md

e303638

Update README.md

0c4d149

Delete export_tasks_filtered.py

f0737e5

Update README.md

53609ae

Update README.md

8b92120

chore: bump version to 0.2.107

33459b2

Co-authored-by: Cursor <cursoragent@cursor.com>

chore: update lockfile for 0.2.107

7dbbe39

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #47 from fleet-ai/add-factual-answer-support

1bcfb21

feat: add factual_answer field to Task and API models

chore: bump version to 0.2.108

875a297

Co-authored-by: Cursor <cursoragent@cursor.com>

feat: add task_modality support to async SDK

aa03cd0

Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>

andrew-stelmach-fleet and others added 28 commits February 5, 2026 11:52

fix: Add alias for TaskInfo env_key field to support env_id

747d945

The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: Update example files to use task.env_key instead of task.env_id

2d438b3

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #55 from fleet-ai/feat/add-task-lifecycle-status

7ec526b

feat: Add task_lifecycle_status field to Task model

Merge pull request #56 from fleet-ai/fix/task-response-optional-env-key

3a4f711

fix: make TaskResponse.env_key optional to handle null API responses

Revert "Merge pull request #56 from fleet-ai/fix/task-response-option…

5715f73

…al-env-key" This reverts commit 3a4f711, reversing changes made to 7ec526b.

Merge pull request #57 from fleet-ai/revert/task-response-optional-en…

bb30e38

…v-key revert: restore env_key as required in TaskResponse and Task

export_tasks

942a9af

fix: remove stale output_json_schema warning from import_tasks

9065029

The SDK now correctly imports output_json_schema automatically via the API, so the manual-copy warning is no longer accurate. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #65 from fleet-ai/fix/remove-output-schema-warning

1e6a928

fix: remove stale output_json_schema warning from import_tasks

Merge pull request #68 from fleet-ai/feat/task-bundle-editing

62d2251

feat: add task bundle editing scripts

fix: use env_variables.TASK_KEY exclusively for S3 path validation

6cd1311

Drop fallback chain — only check S3 solutions paths against the TASK_KEY env variable, which is the actual file-set key used in S3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor Bot reviewed Mar 3, 2026

View reviewed changes

gg2001 force-pushed the main branch from 51131ab to e3c5571 Compare April 14, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(validate): error when verifier S3 solutions path doesn't match task key#75

fix(validate): error when verifier S3 solutions path doesn't match task key#75
mikesklar wants to merge 75 commits into
mainfrom
fix/validate-s3-solutions-path

mikesklar commented Mar 3, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mikesklar commented Mar 3, 2026

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 3, 2026

Choose a reason for hiding this comment

S3 path check ignores new_key argument

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

S3 path check ignores `new_key` argument