feat(task-bundle-editing): add skills, generic template, fix Makefile#80
Open
mnarayan wants to merge 78 commits into
Open
feat(task-bundle-editing): add skills, generic template, fix Makefile#80mnarayan wants to merge 78 commits into
mnarayan wants to merge 78 commits into
Conversation
When verifier code contains multiple functions (e.g., a main verifier function and helper functions), the helper functions were not accessible from the main function due to namespace isolation. The exec() call created functions in local_namespace, but the main function's __globals__ pointed to exec_globals which didn't contain the helper functions. This caused NameError when the main function tried to call helpers, which was silently caught and returned 0.0. Fix: Merge local_namespace into exec_globals after exec() so all defined functions are accessible when the verifier is called.
…tions-namespace fix: allow verifier helper functions to be called from main verifier
InstanceRequest changes: - Add: profile_id, async_provision, instance_mode, ssh_public_keys, snapshot_interval_minutes, version (deprecated) - Fix: region default changed from 'us-west-1' to None (server decides) - Fix: created_from default changed from None to 'api' TaskRequest changes: - Add: verifier_func, project_key, data_id, data_version, writer_metadata - Add: model_config with extra='ignore' and populate_by_name=True - Add: alias='env_id' for environment_id field - Remove: metadata (doesn't exist in orchestrator TaskRequest, only in TaskResponse)
…API" This reverts commit 9a0af14.
add metadata to tasks in SDK
bump version
…odels Add factual_answer field to support research/factual tasks: - Task model: stores expected answer for verification - TaskRequest: accept factual_answer when creating tasks - TaskResponse: return factual_answer from API Part of: https://linear.app/fleet-ai/issue/ENG-843/import-script-needs-to-support-output-json-schemas Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat: add factual_answer field to Task and API models
Add task_modality field to Task and TaskResponse models to support copying task modality (computer_use, tool_use, browser) when importing tasks via the SDK. Changes: - Add task_modality to TaskResponse model (API response) - Add task_modality to Task model (SDK model) - Pass task_modality from TaskResponse to Task in load_tasks Co-authored-by: Cursor <cursoragent@cursor.com>
Addresses Bugbot comment: load_task_from_json wasn't extracting task_modality from JSON data, causing tasks loaded from JSON files to have task_modality=None even when the JSON contains this field. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>
The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored env_id fallback in load_task_from_json for backward compatibility with existing JSON files. Co-authored-by: Cursor <cursoragent@cursor.com>
The make() method was using self.env_key (raw field) instead of self.get_env_key() (computed method with version). This would cause environments to be created without the version suffix. Co-authored-by: Cursor <cursoragent@cursor.com>
The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>
When export_tasks serializes tasks, it outputs env_key. The loading function needs to check for env_key first (canonical name), then fallback to environment_id (API) and env_id (legacy). Co-authored-by: Cursor <cursoragent@cursor.com>
- TaskResponse: rename environment_id -> env_key (alias="environment_id") - TaskRequest: rename environment_id -> env_key (alias="environment_id") - Add ConfigDict(populate_by_name=True) for alias support - Add Task.env_spec property for env_key:version string - Use task.env_spec in Task.make() and make_for_task() - Clean up load_tasks to use task_response.env_key directly - Remove scattered inline env_key:version string building Co-authored-by: Cursor <cursoragent@cursor.com>
- data_spec: renamed from data_key (data_key kept as alias) - has_verifier: whether task has verifier_func or verifier - is_research_based: whether task has a factual_answer - is_action_based: inverse of is_research_based Co-authored-by: Cursor <cursoragent@cursor.com>
TaskInfo has alias="env_id" on env_key field but was missing model_config = ConfigDict(populate_by_name=True). Without this, creating TaskInfo(env_key="...") would fail since only the alias name was accepted. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…status feat: Add task_lifecycle_status field to Task model
The PUT /v1/tasks/{task_key} endpoint can return environment_id: null,
which caused a Pydantic validation error since env_key was required.
This made update_task crash instead of returning a TaskResponse.
- TaskResponse.env_key: str -> Optional[str]
- Task.env_key: str -> Optional[str]
- Task.env_spec now returns None when env_key is absent
Co-authored-by: Cursor <cursoragent@cursor.com>
When a task has env_key=None, make_for_task would pass None to make() causing a TypeError at ":" in env_key. Now raises a clear ValueError matching the guard in Task.make(). Co-authored-by: Cursor <cursoragent@cursor.com>
…al-env-key fix: make TaskResponse.env_key optional to handle null API responses
…ional-env-key revert: restore env_key as required in TaskResponse and Task
The SDK now correctly imports output_json_schema automatically via the API, so the manual-copy warning is no longer accurate. Co-authored-by: Cursor <cursoragent@cursor.com>
…-warning fix: remove stale output_json_schema warning from import_tasks
Simple scripts for task authors to download existing tasks, edit them locally, and upload as new tasks. Uses raw requests (no SDK dependency) with auto-resolved team ID from API key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add task bundle editing scripts
) * fix(download): don't pass auto-resolved team_id to task GET The team_id query param on GET /v1/tasks/{key} requires admin privileges. Previously the script always passed it (from auto-resolve), causing 403 errors for non-admin API keys. Now only passes team_id when explicitly provided via --team-id flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip resolve_team_id when --team-id is explicitly provided Avoids an unnecessary API call that could fail and block the download. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…leet-ai#71) * feat(upload): add job launching and auto-generated unique task keys - Make --key optional; auto-generates {original_key}_{uuid[:8]} when omitted - Replace local key comparison with server-side existence check (GET /v1/tasks/{key}) - Launch job by default after upload (POST /v1/jobs) with --no-launch-job to skip - Add --models, --pass-k flags for job configuration - Default models: gemini-3.1-pro-preview, claude-opus-4.6, gpt-5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(upload): raise on unexpected status in task existence check Previously any non-200 (including 500, 403, 429) was treated as "key available", silently skipping the guard. Now only 404 means available; other errors are surfaced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The file-set key may differ from the task key (e.g., without a version suffix). Pull the key from the task's env_variables.TASK_KEY when available, falling back to the CLI --task-key argument. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…validation (fleet-ai#73) The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(upload): extract job_id from API response and add workspace file validation The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add standalone launch_job script for existing tasks Provides a simple way to launch jobs for tasks that already exist on the server, without the upload/create flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The web page now redirects the browser to http://127.0.0.1:PORT/callback with tokens as query params instead of POSTing JSON. Replace do_POST + do_OPTIONS (CORS preflight) with a do_GET handler that reads query params, validates the state nonce, and returns a plain HTML confirmation page. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… URL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o-our-sdk feat: add flt login browser auth flow (ENG-1192)
…fleet-ai#79) * feat(task-bundle-editing): add Makefile, templates, CLAUDE.md, and workflow docs Add task authoring toolkit for creating, editing, and deploying Fleet evaluation tasks. Includes Makefile wrapping existing Python scripts, verifier templates for analysis and plot tasks, CLAUDE.md for Claude Code integration, and extended README with end-to-end workflow documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clarify placeholder notation in template schema notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… guards - Add Claude Code skills: task-authoring (generic), fleet-status - Add generic verifier_template.json; remove DS-specific templates (analysis, plot) for a future domain-specific PR - Fix Makefile: validate/upload accept DIR= without requiring TASK - README: expand edit workflow, clarify templates as design scaffolds - CLAUDE.md: update skills section Addresses review comments on PR fleet-ai#79: - Templates clarified as design scaffolds, not direct task.json - Makefile DIR-only workflow unblocked Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
task-authoring) and job monitoring (fleet-status)verifier_template.json; replace DS-specific templates (analysis, plot) with a single task-agnostic scaffoldvalidateanduploadnow acceptDIR=without requiring an active taskAddresses review feedback from #79
task.jsonfilesDIR=-only workflow unblocked forvalidateanduploadtargetsTest plan
make validate DIR=<bundle>works without.taskset.claude/skills/verifier_template.jsonplaceholders are clear and task-agnostic🤖 Generated with Claude Code