Skip to content

feat: pluggable LLM provider interface for judge#78

Open
andrew-stelmach-fleet wants to merge 5 commits into
final-103from
feat/llm-provider-interface
Open

feat: pluggable LLM provider interface for judge#78
andrew-stelmach-fleet wants to merge 5 commits into
final-103from
feat/llm-provider-interface

Conversation

@andrew-stelmach-fleet

Copy link
Copy Markdown
Contributor

Summary

  • Adds an LLMProvider abstraction layer (fleet/llm_provider.py) that decouples judge LLM calls from the Fleet orchestrator, enabling on-prem deployments
  • Ships three provider implementations: FleetProvider (default/existing behavior), ExternalProvider (OpenRouter, Anthropic, local models via OpenAI-compatible API), and the abstract LLMProvider base for custom implementations
  • SyncJudge, AsyncJudge, SyncEnv, and AsyncEnv accept an optional llm_provider kwarg — fully backward compatible, no changes needed for existing callers

Usage

# Default (no change needed) — routes through Fleet orchestrator
result = env.judge.grade(rubric, submission)

# External provider — direct LLM API calls (on-prem / no orchestrator)
from fleet.llm_provider import ExternalProvider

provider = ExternalProvider(
    api_key="sk-or-...",
    base_url="https://openrouter.ai/api/v1",
    model="anthropic/claude-sonnet-4",
)
judge = SyncJudge(client=None, instance_id="local", llm_provider=provider)
result = judge.grade(rubric, submission)

# Custom provider — implement your own
class MyProvider(LLMProvider):
    def grade(self, request: GradeRequest) -> GradeResponse:
        ...  # your logic here

Architecture

                        ┌─────────────────────┐
                        │   SyncJudge.grade()  │
                        └────────┬────────────┘
                                 │
                   ┌─────────────┴─────────────┐
                   │  llm_provider is None?     │
                   └──────┬──────────────┬──────┘
                     Yes  │              │  No
                          ▼              ▼
              ┌───────────────┐  ┌──────────────────┐
              │ FleetProvider │  │  LLMProvider      │
              │ (orchestrator │  │  .grade(request)  │
              │  POST /v1/    │  │                   │
              │  judge/grade) │  │  ExternalProvider │
              └───────────────┘  │  CustomProvider   │
                                 └──────────────────┘

Test plan

  • 21 new unit tests in tests/test_llm_provider.py covering:
    • GradeResponse serialization
    • User message construction from rubrics (string + structured)
    • LLM JSON response parsing (valid, markdown-fenced, invalid)
    • FleetProvider delegation to orchestrator client
    • ExternalProvider request building + mocked HTTP calls
    • Custom LLMProvider implementation
    • SyncJudge routing (with provider / without provider)
    • End-to-end: ExternalProvider → SyncJudge → JudgeResult
  • All 9 existing test_judge_criteria_markers.py tests still pass (backward compatibility)
# Run tests
cd fleet-sdk
uv run pytest tests/test_llm_provider.py tests/test_judge_criteria_markers.py -v

🤖 Generated with Claude Code

Introduces an LLMProvider abstraction that allows judge LLM calls to be
routed to either the Fleet orchestrator (default, existing behavior) or
any OpenAI-compatible external endpoint (OpenRouter, Anthropic, local
models, etc.). This enables on-prem deployments that don't depend on
Fleet's internal orchestrator endpoints.

New module: fleet/llm_provider.py
- LLMProvider: abstract base class defining the grade()/agrade() interface
- FleetProvider: routes through orchestrator POST /v1/judge/grade (default)
- ExternalProvider: direct calls to OpenAI-compatible chat completions API
- GradeRequest/GradeResponse: provider-agnostic data types

Changes to existing code:
- SyncJudge/AsyncJudge: accept optional llm_provider kwarg
- SyncEnv/AsyncEnv: accept optional llm_provider kwarg, passed to judge
- Fully backward compatible — no changes needed for existing callers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread fleet/client.py
Three additions to the LLM provider abstraction:

1. **Env var auto-configuration**: SyncJudge/AsyncJudge now auto-detect
   FLEET_LLM_API_KEY at init time and route to an ExternalProvider when
   set. No code changes needed — just set env vars:
     FLEET_LLM_API_KEY, FLEET_LLM_BASE_URL, FLEET_LLM_MODEL,
     FLEET_LLM_TEMPERATURE, FLEET_LLM_MAX_TOKENS, FLEET_LLM_TIMEOUT
   When unset, defaults to Fleet orchestrator (existing behavior).
   Explicit llm_provider kwarg always takes priority.

2. **Image.from_local(path)**: Read an image from a local file path
   instead of requiring S3 URLs. File is lazily base64-encoded at
   serialization time. Works with both Fleet orchestrator and external
   providers.

3. **File.from_local(path)**: Same for arbitrary files (PDF, CSV, etc.).

These three changes together make the judge fully portable for on-prem:
set env vars, use local file paths, zero orchestrator dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread fleet/client.py
Doofus Bot and others added 3 commits March 11, 2026 13:56
…der resolve

Image/File now support a source-agnostic constructor that stores a raw
path/URI. The LLM provider resolves the path at grade-time via new
resolve_image()/resolve_file() methods, keeping verifier code independent
of storage backends (S3, local filesystem, HTTP).

- Image.from_path("screenshots/gold.png") — just a path, no source assumption
- File.from_path("s3://bucket/data.csv") — provider handles scheme detection
- LLMProvider.resolve_image/resolve_file — default auto-detects URI scheme
- Custom providers can override resolve to prepend S3 prefix, fetch from GCS, etc.
- serialize() fallback for when no provider resolves (backward compat)
- 70 tests passing (26 new)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or consistency

- Fix LLMProvider.agrade() to use run_in_executor instead of blocking the
  event loop when custom providers don't override it
- Add file serialization to ExternalProvider messages (text files inlined,
  binary files get metadata)
- Wrap ExternalProvider HTTP calls so errors return 0.0 GradeResponse
  instead of raising (consistent with parse-error behavior)
- Validate api_key is non-empty in ExternalProvider.__init__
- Tighten GradeRequest type hints from Any to proper Union types
- Add 13 new tests covering async, files, errors, and validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants