feat: pluggable LLM provider interface for judge by andrew-stelmach-fleet · Pull Request #78 · fleet-ai/fleet-sdk

andrew-stelmach-fleet · 2026-03-11T13:29:30Z

Summary

Adds an LLMProvider abstraction layer (fleet/llm_provider.py) that decouples judge LLM calls from the Fleet orchestrator, enabling on-prem deployments
Ships three provider implementations: FleetProvider (default/existing behavior), ExternalProvider (OpenRouter, Anthropic, local models via OpenAI-compatible API), and the abstract LLMProvider base for custom implementations
SyncJudge, AsyncJudge, SyncEnv, and AsyncEnv accept an optional llm_provider kwarg — fully backward compatible, no changes needed for existing callers

Usage

# Default (no change needed) — routes through Fleet orchestrator
result = env.judge.grade(rubric, submission)

# External provider — direct LLM API calls (on-prem / no orchestrator)
from fleet.llm_provider import ExternalProvider

provider = ExternalProvider(
    api_key="sk-or-...",
    base_url="https://openrouter.ai/api/v1",
    model="anthropic/claude-sonnet-4",
)
judge = SyncJudge(client=None, instance_id="local", llm_provider=provider)
result = judge.grade(rubric, submission)

# Custom provider — implement your own
class MyProvider(LLMProvider):
    def grade(self, request: GradeRequest) -> GradeResponse:
        ...  # your logic here

Architecture

                        ┌─────────────────────┐
                        │   SyncJudge.grade()  │
                        └────────┬────────────┘
                                 │
                   ┌─────────────┴─────────────┐
                   │  llm_provider is None?     │
                   └──────┬──────────────┬──────┘
                     Yes  │              │  No
                          ▼              ▼
              ┌───────────────┐  ┌──────────────────┐
              │ FleetProvider │  │  LLMProvider      │
              │ (orchestrator │  │  .grade(request)  │
              │  POST /v1/    │  │                   │
              │  judge/grade) │  │  ExternalProvider │
              └───────────────┘  │  CustomProvider   │
                                 └──────────────────┘

Test plan

21 new unit tests in tests/test_llm_provider.py covering:
- GradeResponse serialization
- User message construction from rubrics (string + structured)
- LLM JSON response parsing (valid, markdown-fenced, invalid)
- FleetProvider delegation to orchestrator client
- ExternalProvider request building + mocked HTTP calls
- Custom LLMProvider implementation
- SyncJudge routing (with provider / without provider)
- End-to-end: ExternalProvider → SyncJudge → JudgeResult
All 9 existing test_judge_criteria_markers.py tests still pass (backward compatibility)

# Run tests
cd fleet-sdk
uv run pytest tests/test_llm_provider.py tests/test_judge_criteria_markers.py -v

🤖 Generated with Claude Code

Introduces an LLMProvider abstraction that allows judge LLM calls to be routed to either the Fleet orchestrator (default, existing behavior) or any OpenAI-compatible external endpoint (OpenRouter, Anthropic, local models, etc.). This enables on-prem deployments that don't depend on Fleet's internal orchestrator endpoints. New module: fleet/llm_provider.py - LLMProvider: abstract base class defining the grade()/agrade() interface - FleetProvider: routes through orchestrator POST /v1/judge/grade (default) - ExternalProvider: direct calls to OpenAI-compatible chat completions API - GradeRequest/GradeResponse: provider-agnostic data types Changes to existing code: - SyncJudge/AsyncJudge: accept optional llm_provider kwarg - SyncEnv/AsyncEnv: accept optional llm_provider kwarg, passed to judge - Fully backward compatible — no changes needed for existing callers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three additions to the LLM provider abstraction: 1. **Env var auto-configuration**: SyncJudge/AsyncJudge now auto-detect FLEET_LLM_API_KEY at init time and route to an ExternalProvider when set. No code changes needed — just set env vars: FLEET_LLM_API_KEY, FLEET_LLM_BASE_URL, FLEET_LLM_MODEL, FLEET_LLM_TEMPERATURE, FLEET_LLM_MAX_TOKENS, FLEET_LLM_TIMEOUT When unset, defaults to Fleet orchestrator (existing behavior). Explicit llm_provider kwarg always takes priority. 2. **Image.from_local(path)**: Read an image from a local file path instead of requiring S3 URLs. File is lazily base64-encoded at serialization time. Works with both Fleet orchestrator and external providers. 3. **File.from_local(path)**: Same for arbitrary files (PDF, CSV, etc.). These three changes together make the judge fully portable for on-prem: set env vars, use local file paths, zero orchestrator dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

…der resolve Image/File now support a source-agnostic constructor that stores a raw path/URI. The LLM provider resolves the path at grade-time via new resolve_image()/resolve_file() methods, keeping verifier code independent of storage backends (S3, local filesystem, HTTP). - Image.from_path("screenshots/gold.png") — just a path, no source assumption - File.from_path("s3://bucket/data.csv") — provider handles scheme detection - LLMProvider.resolve_image/resolve_file — default auto-detects URI scheme - Custom providers can override resolve to prepend S3 prefix, fetch from GCS, etc. - serialize() fallback for when no provider resolves (backward compat) - 70 tests passing (26 new) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r-interface

…or consistency - Fix LLMProvider.agrade() to use run_in_executor instead of blocking the event loop when custom providers don't override it - Add file serialization to ExternalProvider messages (text files inlined, binary files get metadata) - Wrap ExternalProvider HTTP calls so errors return 0.0 GradeResponse instead of raising (consistent with parse-error behavior) - Validate api_key is non-empty in ExternalProvider.__init__ - Tighten GradeRequest type hints from Any to proper Union types - Add 13 new tests covering async, files, errors, and validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread fleet/client.py

cursor Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread fleet/client.py

Doofus Bot and others added 3 commits March 11, 2026 13:56

Merge remote-tracking branch 'origin/final-103' into feat/llm-provide…

5f2320b

…r-interface

jarredFleetSo force-pushed the feat/llm-provider-interface branch from a466ec9 to 3c39462 Compare April 1, 2026 17:58

jarredFleetSo mentioned this pull request Apr 10, 2026

feat: add fleet.judge module for LLM-as-a-judge grading (ENG-1527) #92

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable LLM provider interface for judge#78

feat: pluggable LLM provider interface for judge#78
andrew-stelmach-fleet wants to merge 5 commits into
final-103from
feat/llm-provider-interface

andrew-stelmach-fleet commented Mar 11, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrew-stelmach-fleet commented Mar 11, 2026

Summary

Usage

Architecture

Test plan

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants