Skip to content

examples: add local ART-E email search task#679

Open
Vinzz2303 wants to merge 2 commits into
OpenPipe:mainfrom
Vinzz2303:add-local-art-e-email-example
Open

examples: add local ART-E email search task#679
Vinzz2303 wants to merge 2 commits into
OpenPipe:mainfrom
Vinzz2303:add-local-art-e-email-example

Conversation

@Vinzz2303
Copy link
Copy Markdown

Summary

  • Adds a lightweight local ART-E email search example under examples/art_e
  • Includes deterministic email scenarios, search/read helpers, answer scoring, rollout loop, and a training entrypoint
  • Adds unit tests for the local scenario helpers and links the example from the root README

Motivation

The README points users to the ART-E email-search agent, but the main repo did not include a small local example that can be inspected without external email infrastructure. This adds a portable starter task for experimenting with ART-style email search rollouts before scaling to the full ART-E setup.

What changed

  • examples/art_e/scenarios.py: local inbox fixtures, search/read helpers, JSON command parsing, and reward scoring
  • examples/art_e/rollout.py: multi-turn ART trajectory using <search>, <read>, and <answer> commands
  • examples/art_e/train.py: LocalBackend training loop for the ART-E style task
  • examples/art_e/README.md: quickstart and file overview
  • tests/test_art_e_example.py: helper-level tests for search, read, scoring, and command parsing
  • README.md: adds the local ART-E example to the notebooks/examples table

Testing

Not run locally because this Windows environment does not currently have python or uv available in PATH. The added tests are intentionally small and should run with the repo test suite once dependencies are installed.

@Vinzz2303
Copy link
Copy Markdown
Author

I pushed an update to strengthen this ART-E example PR.

New addition:

  • offline evaluation harness for the local ART-E task
  • deterministic baseline over all scenarios
  • tests confirming the offline evaluator scores the fixtures correctly
  • README instructions for running the no-model evaluation

This should make the example easier to review because the task contract can be checked without API keys or training infrastructure.

Note: I still could not run Python locally because this Windows environment does not have python/uv in PATH, but git diff --check passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant