fix(validate): error when verifier S3 solutions path doesn't match task key#75
fix(validate): error when verifier S3 solutions path doesn't match task key#75mikesklar wants to merge 75 commits into
Conversation
When verifier code contains multiple functions (e.g., a main verifier function and helper functions), the helper functions were not accessible from the main function due to namespace isolation. The exec() call created functions in local_namespace, but the main function's __globals__ pointed to exec_globals which didn't contain the helper functions. This caused NameError when the main function tried to call helpers, which was silently caught and returned 0.0. Fix: Merge local_namespace into exec_globals after exec() so all defined functions are accessible when the verifier is called.
…mespace fix: allow verifier helper functions to be called from main verifier
InstanceRequest changes: - Add: profile_id, async_provision, instance_mode, ssh_public_keys, snapshot_interval_minutes, version (deprecated) - Fix: region default changed from 'us-west-1' to None (server decides) - Fix: created_from default changed from None to 'api' TaskRequest changes: - Add: verifier_func, project_key, data_id, data_version, writer_metadata - Add: model_config with extra='ignore' and populate_by_name=True - Add: alias='env_id' for environment_id field - Remove: metadata (doesn't exist in orchestrator TaskRequest, only in TaskResponse)
…API" This reverts commit 9a0af14.
add metadata to tasks in SDK
bump version
…odels Add factual_answer field to support research/factual tasks: - Task model: stores expected answer for verification - TaskRequest: accept factual_answer when creating tasks - TaskResponse: return factual_answer from API Part of: https://linear.app/fleet-ai/issue/ENG-843/import-script-needs-to-support-output-json-schemas Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat: add factual_answer field to Task and API models
Add task_modality field to Task and TaskResponse models to support copying task modality (computer_use, tool_use, browser) when importing tasks via the SDK. Changes: - Add task_modality to TaskResponse model (API response) - Add task_modality to Task model (SDK model) - Pass task_modality from TaskResponse to Task in load_tasks Co-authored-by: Cursor <cursoragent@cursor.com>
Addresses Bugbot comment: load_task_from_json wasn't extracting task_modality from JSON data, causing tasks loaded from JSON files to have task_modality=None even when the JSON contains this field. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>
The task_lifecycle_status field was added to the Task model but was missing from: - TaskResponse model (sync and async) - needed to parse API response - load_tasks method - needed to pass the field to Task constructor This completes the task_lifecycle_status support in the SDK. Co-authored-by: Cursor <cursoragent@cursor.com>
The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored fallback for env_key in load_task_from_json to support JSON files that use env_key field. Co-authored-by: Cursor <cursoragent@cursor.com>
The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored env_id fallback in load_task_from_json for backward compatibility with existing JSON files. Co-authored-by: Cursor <cursoragent@cursor.com>
The make() method was using self.env_key (raw field) instead of self.get_env_key() (computed method with version). This would cause environments to be created without the version suffix. Co-authored-by: Cursor <cursoragent@cursor.com>
The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>
When export_tasks serializes tasks, it outputs env_key. The loading function needs to check for env_key first (canonical name), then fallback to environment_id (API) and env_id (legacy). Co-authored-by: Cursor <cursoragent@cursor.com>
- TaskResponse: rename environment_id -> env_key (alias="environment_id") - TaskRequest: rename environment_id -> env_key (alias="environment_id") - Add ConfigDict(populate_by_name=True) for alias support - Add Task.env_spec property for env_key:version string - Use task.env_spec in Task.make() and make_for_task() - Clean up load_tasks to use task_response.env_key directly - Remove scattered inline env_key:version string building Co-authored-by: Cursor <cursoragent@cursor.com>
- data_spec: renamed from data_key (data_key kept as alias) - has_verifier: whether task has verifier_func or verifier - is_research_based: whether task has a factual_answer - is_action_based: inverse of is_research_based Co-authored-by: Cursor <cursoragent@cursor.com>
TaskInfo has alias="env_id" on env_key field but was missing model_config = ConfigDict(populate_by_name=True). Without this, creating TaskInfo(env_key="...") would fail since only the alias name was accepted. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat: Add task_lifecycle_status field to Task model
The PUT /v1/tasks/{task_key} endpoint can return environment_id: null,
which caused a Pydantic validation error since env_key was required.
This made update_task crash instead of returning a TaskResponse.
- TaskResponse.env_key: str -> Optional[str]
- Task.env_key: str -> Optional[str]
- Task.env_spec now returns None when env_key is absent
Co-authored-by: Cursor <cursoragent@cursor.com>
When a task has env_key=None, make_for_task would pass None to make() causing a TypeError at ":" in env_key. Now raises a clear ValueError matching the guard in Task.make(). Co-authored-by: Cursor <cursoragent@cursor.com>
fix: make TaskResponse.env_key optional to handle null API responses
…v-key revert: restore env_key as required in TaskResponse and Task
The SDK now correctly imports output_json_schema automatically via the API, so the manual-copy warning is no longer accurate. Co-authored-by: Cursor <cursoragent@cursor.com>
fix: remove stale output_json_schema warning from import_tasks
Simple scripts for task authors to download existing tasks, edit them locally, and upload as new tasks. Uses raw requests (no SDK dependency) with auto-resolved team ID from API key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add task bundle editing scripts
* fix(download): don't pass auto-resolved team_id to task GET
The team_id query param on GET /v1/tasks/{key} requires admin
privileges. Previously the script always passed it (from auto-resolve),
causing 403 errors for non-admin API keys. Now only passes team_id
when explicitly provided via --team-id flag.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: skip resolve_team_id when --team-id is explicitly provided
Avoids an unnecessary API call that could fail and block the download.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
) * feat(upload): add job launching and auto-generated unique task keys - Make --key optional; auto-generates {original_key}_{uuid[:8]} when omitted - Replace local key comparison with server-side existence check (GET /v1/tasks/{key}) - Launch job by default after upload (POST /v1/jobs) with --no-launch-job to skip - Add --models, --pass-k flags for job configuration - Default models: gemini-3.1-pro-preview, claude-opus-4.6, gpt-5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(upload): raise on unexpected status in task existence check Previously any non-200 (including 500, 403, 429) was treated as "key available", silently skipping the guard. Now only 404 means available; other errors are surfaced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The file-set key may differ from the task key (e.g., without a version suffix). Pull the key from the task's env_variables.TASK_KEY when available, falling back to the CLI --task-key argument. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…validation (#73) The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(upload): extract job_id from API response and add workspace file validation The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add standalone launch_job script for existing tasks Provides a simple way to launch jobs for tasks that already exist on the server, without the upload/create flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…sk key Verifiers that use Image.s3() to load gold-reference images embed the task key as a path segment. When uploading under a new key, forgetting to update these paths causes the verifier to silently load the wrong solutions. The validator now checks that the expected TASK_KEY appears as a path segment in every S3 URL containing /solutions/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop fallback chain — only check S3 solutions paths against the TASK_KEY env variable, which is the actual file-set key used in S3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| f"Verifier S3 solutions path does not contain " | ||
| f"expected key '{expected_key}' as a path segment: " | ||
| f"{url}" | ||
| ) |
There was a problem hiding this comment.
S3 path check ignores new_key argument
High Severity
The new S3 solutions-path validation uses task_key_var (the TASK_KEY from env_variables) as expected_key, but ignores new_key when it is supplied via --new-key. This means that when a user uploads under a new key and forgets to update the verifier's S3 URLs, both task_key_var and the stale URLs still contain the old key — the check finds them "consistent" and emits no error. The exact bug the PR is designed to catch is silently missed. expected_key should be new_key or task_key_var so that the check validates URLs against the intended upload key when one is provided.


Summary
Image.s3()embed the task key as a path segment in S3 URLs (e.g..../<TASK_KEY>/solutions/gold_plot.png)TASK_KEYappears as a path segment in every S3 URL containing/solutions/Test plan
🤖 Generated with Claude Code