feat(task-bundle-editing): add skills, generic template, fix Makefile by mnarayan · Pull Request #80 · fleet-ai/fleet-sdk

mnarayan · 2026-03-12T18:09:25Z

Summary

Add Claude Code skills for task authoring (task-authoring) and job monitoring (fleet-status)
Add generic verifier_template.json; replace DS-specific templates (analysis, plot) with a single task-agnostic scaffold
Fix Makefile: validate and upload now accept DIR= without requiring an active task
Improve README: expand edit workflow docs, clarify templates as design scaffolds, add from-scratch orientation guide
Update CLAUDE.md skills section

Addresses review feedback from #79

Templates clarified as design scaffolds, not directly valid task.json files
Makefile DIR=-only workflow unblocked for validate and upload targets

Test plan

make validate DIR=<bundle> works without .task set
Skills load correctly in Claude Code when copied to .claude/skills/
verifier_template.json placeholders are clear and task-agnostic

🤖 Generated with Claude Code

When verifier code contains multiple functions (e.g., a main verifier function and helper functions), the helper functions were not accessible from the main function due to namespace isolation. The exec() call created functions in local_namespace, but the main function's __globals__ pointed to exec_globals which didn't contain the helper functions. This caused NameError when the main function tried to call helpers, which was silently caught and returned 0.0. Fix: Merge local_namespace into exec_globals after exec() so all defined functions are accessible when the verifier is called.

…tions-namespace fix: allow verifier helper functions to be called from main verifier

InstanceRequest changes: - Add: profile_id, async_provision, instance_mode, ssh_public_keys, snapshot_interval_minutes, version (deprecated) - Fix: region default changed from 'us-west-1' to None (server decides) - Fix: created_from default changed from None to 'api' TaskRequest changes: - Add: verifier_func, project_key, data_id, data_version, writer_metadata - Add: model_config with extra='ignore' and populate_by_name=True - Add: alias='env_id' for environment_id field - Remove: metadata (doesn't exist in orchestrator TaskRequest, only in TaskResponse)

…API" This reverts commit 9a0af14.

add metadata to tasks in SDK

bump version

consolidate

…odels Add factual_answer field to support research/factual tasks: - Task model: stores expected answer for verification - TaskRequest: accept factual_answer when creating tasks - TaskResponse: return factual_answer from API Part of: https://linear.app/fleet-ai/issue/ENG-843/import-script-needs-to-support-output-json-schemas Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

feat: add factual_answer field to Task and API models

Add task_modality field to Task and TaskResponse models to support copying task modality (computer_use, tool_use, browser) when importing tasks via the SDK. Changes: - Add task_modality to TaskResponse model (API response) - Add task_modality to Task model (SDK model) - Pass task_modality from TaskResponse to Task in load_tasks Co-authored-by: Cursor <cursoragent@cursor.com>

Addresses Bugbot comment: load_task_from_json wasn't extracting task_modality from JSON data, causing tasks loaded from JSON files to have task_modality=None even when the JSON contains this field. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>

The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored env_id fallback in load_task_from_json for backward compatibility with existing JSON files. Co-authored-by: Cursor <cursoragent@cursor.com>

The make() method was using self.env_key (raw field) instead of self.get_env_key() (computed method with version). This would cause environments to be created without the version suffix. Co-authored-by: Cursor <cursoragent@cursor.com>

The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>

When export_tasks serializes tasks, it outputs env_key. The loading function needs to check for env_key first (canonical name), then fallback to environment_id (API) and env_id (legacy). Co-authored-by: Cursor <cursoragent@cursor.com>

- TaskResponse: rename environment_id -> env_key (alias="environment_id") - TaskRequest: rename environment_id -> env_key (alias="environment_id") - Add ConfigDict(populate_by_name=True) for alias support - Add Task.env_spec property for env_key:version string - Use task.env_spec in Task.make() and make_for_task() - Clean up load_tasks to use task_response.env_key directly - Remove scattered inline env_key:version string building Co-authored-by: Cursor <cursoragent@cursor.com>

- data_spec: renamed from data_key (data_key kept as alias) - has_verifier: whether task has verifier_func or verifier - is_research_based: whether task has a factual_answer - is_action_based: inverse of is_research_based Co-authored-by: Cursor <cursoragent@cursor.com>

TaskInfo has alias="env_id" on env_key field but was missing model_config = ConfigDict(populate_by_name=True). Without this, creating TaskInfo(env_key="...") would fail since only the alias name was accepted. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…status feat: Add task_lifecycle_status field to Task model

The PUT /v1/tasks/{task_key} endpoint can return environment_id: null, which caused a Pydantic validation error since env_key was required. This made update_task crash instead of returning a TaskResponse. - TaskResponse.env_key: str -> Optional[str] - Task.env_key: str -> Optional[str] - Task.env_spec now returns None when env_key is absent Co-authored-by: Cursor <cursoragent@cursor.com>

When a task has env_key=None, make_for_task would pass None to make() causing a TypeError at ":" in env_key. Now raises a clear ValueError matching the guard in Task.make(). Co-authored-by: Cursor <cursoragent@cursor.com>

…al-env-key fix: make TaskResponse.env_key optional to handle null API responses

…e-optional-env-key" This reverts commit 3a4f711, reversing changes made to 7ec526b.

…ional-env-key revert: restore env_key as required in TaskResponse and Task

The SDK now correctly imports output_json_schema automatically via the API, so the manual-copy warning is no longer accurate. Co-authored-by: Cursor <cursoragent@cursor.com>

…-warning fix: remove stale output_json_schema warning from import_tasks

Simple scripts for task authors to download existing tasks, edit them locally, and upload as new tasks. Uses raw requests (no SDK dependency) with auto-resolved team ID from API key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add task bundle editing scripts

) * fix(download): don't pass auto-resolved team_id to task GET The team_id query param on GET /v1/tasks/{key} requires admin privileges. Previously the script always passed it (from auto-resolve), causing 403 errors for non-admin API keys. Now only passes team_id when explicitly provided via --team-id flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip resolve_team_id when --team-id is explicitly provided Avoids an unnecessary API call that could fail and block the download. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…leet-ai#71) * feat(upload): add job launching and auto-generated unique task keys - Make --key optional; auto-generates {original_key}_{uuid[:8]} when omitted - Replace local key comparison with server-side existence check (GET /v1/tasks/{key}) - Launch job by default after upload (POST /v1/jobs) with --no-launch-job to skip - Add --models, --pass-k flags for job configuration - Default models: gemini-3.1-pro-preview, claude-opus-4.6, gpt-5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(upload): raise on unexpected status in task existence check Previously any non-200 (including 500, 403, 429) was treated as "key available", silently skipping the guard. Now only 404 means available; other errors are surfaced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

The file-set key may differ from the task key (e.g., without a version suffix). Pull the key from the task's env_variables.TASK_KEY when available, falling back to the CLI --task-key argument. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…validation (fleet-ai#73) The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(upload): extract job_id from API response and add workspace file validation The jobs API returns `job_id` not `id` — fix extraction so the job ID and dashboard URL are printed after launch. Also add validation that data files are under files/notebooks/ (the path unpacked into the agent workspace) and that the prompt's list_workspace_files() pattern matches actual files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add standalone launch_job script for existing tasks Provides a simple way to launch jobs for tasks that already exist on the server, without the upload/create flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

The web page now redirects the browser to http://127.0.0.1:PORT/callback with tokens as query params instead of POSTing JSON. Replace do_POST + do_OPTIONS (CORS preflight) with a do_GET handler that reads query params, validates the state nonce, and returns a plain HTML confirmation page. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… URL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…o-our-sdk feat: add flt login browser auth flow (ENG-1192)

…fleet-ai#79) * feat(task-bundle-editing): add Makefile, templates, CLAUDE.md, and workflow docs Add task authoring toolkit for creating, editing, and deploying Fleet evaluation tasks. Includes Makefile wrapping existing Python scripts, verifier templates for analysis and plot tasks, CLAUDE.md for Claude Code integration, and extended README with end-to-end workflow documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clarify placeholder notation in template schema notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

… guards - Add Claude Code skills: task-authoring (generic), fleet-status - Add generic verifier_template.json; remove DS-specific templates (analysis, plot) for a future domain-specific PR - Fix Makefile: validate/upload accept DIR= without requiring TASK - README: expand edit workflow, clarify templates as design scaffolds - CLAUDE.md: update skills section Addresses review comments on PR fleet-ai#79: - Templates clarified as design scaffolds, not direct task.json - Makefile DIR-only workflow unblocked Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mikesklar and others added 30 commits January 13, 2026 12:21

Update agent.py

b301d67

update gemini cua agent with latest updates

32fa85f

update name

8852db9

Merge pull request fleet-ai#38 from fleet-ai/fix/verifier-helper-func…

5f89234

…tions-namespace fix: allow verifier helper functions to be called from main verifier

Bump version to 0.2.104

54feffd

add metadata to tasks

f895cb6

fixes

fef862d

Revert "fix: align InstanceRequest and TaskRequest with orchestrator …

9cdd7e6

…API" This reverts commit 9a0af14.

Merge pull request fleet-ai#40 from fleet-ai/zz/add-metadata-0121

acc58ed

add metadata to tasks in SDK

bump version

58f8faf

Merge pull request fleet-ai#41 from fleet-ai/zz/2.105

5dbb76e

bump version

consolidate

ee6a8a5

Consolidate all metadata into "metadata" in TaskResponse

05112f7

consolidate

README.md

afb516c

README.md

c08b76b

Update README.md

e303638

Update README.md

0c4d149

Delete export_tasks_filtered.py

f0737e5

Update README.md

53609ae

Update README.md

8b92120

chore: bump version to 0.2.107

33459b2

Co-authored-by: Cursor <cursoragent@cursor.com>

chore: update lockfile for 0.2.107

7dbbe39

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request fleet-ai#47 from fleet-ai/add-factual-answer-support

1bcfb21

feat: add factual_answer field to Task and API models

chore: bump version to 0.2.108

875a297

Co-authored-by: Cursor <cursoragent@cursor.com>

feat: add task_modality support to async SDK

aa03cd0

Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>

andrew-stelmach-fleet and others added 29 commits February 5, 2026 12:01

fix: Add alias for TaskInfo env_key field to support env_id

747d945

The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: Update example files to use task.env_key instead of task.env_id

2d438b3

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request fleet-ai#55 from fleet-ai/feat/add-task-lifecycle-…

7ec526b

…status feat: Add task_lifecycle_status field to Task model

Merge pull request fleet-ai#56 from fleet-ai/fix/task-response-option…

3a4f711

…al-env-key fix: make TaskResponse.env_key optional to handle null API responses

Revert "Merge pull request fleet-ai#56 from fleet-ai/fix/task-respons…

5715f73

…e-optional-env-key" This reverts commit 3a4f711, reversing changes made to 7ec526b.

Merge pull request fleet-ai#57 from fleet-ai/revert/task-response-opt…

bb30e38

…ional-env-key revert: restore env_key as required in TaskResponse and Task

export_tasks

942a9af

fix: remove stale output_json_schema warning from import_tasks

9065029

The SDK now correctly imports output_json_schema automatically via the API, so the manual-copy warning is no longer accurate. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request fleet-ai#65 from fleet-ai/fix/remove-output-schema…

1e6a928

…-warning fix: remove stale output_json_schema warning from import_tasks

Merge pull request fleet-ai#68 from fleet-ai/feat/task-bundle-editing

62d2251

feat: add task bundle editing scripts

fix: use fleetai.com instead of invented app. subdomain for cli-login…

4a496c8

… URL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request fleet-ai#76 from fleet-ai/nico/eng-1192-add-auth-t…

5ae2ddf

…o-our-sdk feat: add flt login browser auth flow (ENG-1192)

gg2001 force-pushed the main branch from 51131ab to e3c5571 Compare April 14, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(task-bundle-editing): add skills, generic template, fix Makefile#80

feat(task-bundle-editing): add skills, generic template, fix Makefile#80
mnarayan wants to merge 78 commits into
fleet-ai:mainfrom
mnarayan:feat/task-authoring-toolkit

mnarayan commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

mnarayan commented Mar 12, 2026

Summary

Addresses review feedback from #79

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants