feat: pluggable LLM provider interface for judge#78
Open
andrew-stelmach-fleet wants to merge 5 commits into
Open
feat: pluggable LLM provider interface for judge#78andrew-stelmach-fleet wants to merge 5 commits into
andrew-stelmach-fleet wants to merge 5 commits into
Conversation
Introduces an LLMProvider abstraction that allows judge LLM calls to be routed to either the Fleet orchestrator (default, existing behavior) or any OpenAI-compatible external endpoint (OpenRouter, Anthropic, local models, etc.). This enables on-prem deployments that don't depend on Fleet's internal orchestrator endpoints. New module: fleet/llm_provider.py - LLMProvider: abstract base class defining the grade()/agrade() interface - FleetProvider: routes through orchestrator POST /v1/judge/grade (default) - ExternalProvider: direct calls to OpenAI-compatible chat completions API - GradeRequest/GradeResponse: provider-agnostic data types Changes to existing code: - SyncJudge/AsyncJudge: accept optional llm_provider kwarg - SyncEnv/AsyncEnv: accept optional llm_provider kwarg, passed to judge - Fully backward compatible — no changes needed for existing callers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three additions to the LLM provider abstraction:
1. **Env var auto-configuration**: SyncJudge/AsyncJudge now auto-detect
FLEET_LLM_API_KEY at init time and route to an ExternalProvider when
set. No code changes needed — just set env vars:
FLEET_LLM_API_KEY, FLEET_LLM_BASE_URL, FLEET_LLM_MODEL,
FLEET_LLM_TEMPERATURE, FLEET_LLM_MAX_TOKENS, FLEET_LLM_TIMEOUT
When unset, defaults to Fleet orchestrator (existing behavior).
Explicit llm_provider kwarg always takes priority.
2. **Image.from_local(path)**: Read an image from a local file path
instead of requiring S3 URLs. File is lazily base64-encoded at
serialization time. Works with both Fleet orchestrator and external
providers.
3. **File.from_local(path)**: Same for arbitrary files (PDF, CSV, etc.).
These three changes together make the judge fully portable for on-prem:
set env vars, use local file paths, zero orchestrator dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
…der resolve
Image/File now support a source-agnostic constructor that stores a raw
path/URI. The LLM provider resolves the path at grade-time via new
resolve_image()/resolve_file() methods, keeping verifier code independent
of storage backends (S3, local filesystem, HTTP).
- Image.from_path("screenshots/gold.png") — just a path, no source assumption
- File.from_path("s3://bucket/data.csv") — provider handles scheme detection
- LLMProvider.resolve_image/resolve_file — default auto-detects URI scheme
- Custom providers can override resolve to prepend S3 prefix, fetch from GCS, etc.
- serialize() fallback for when no provider resolves (backward compat)
- 70 tests passing (26 new)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or consistency - Fix LLMProvider.agrade() to use run_in_executor instead of blocking the event loop when custom providers don't override it - Add file serialization to ExternalProvider messages (text files inlined, binary files get metadata) - Wrap ExternalProvider HTTP calls so errors return 0.0 GradeResponse instead of raising (consistent with parse-error behavior) - Validate api_key is non-empty in ExternalProvider.__init__ - Tighten GradeRequest type hints from Any to proper Union types - Add 13 new tests covering async, files, errors, and validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a466ec9 to
3c39462
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LLMProviderabstraction layer (fleet/llm_provider.py) that decouples judge LLM calls from the Fleet orchestrator, enabling on-prem deploymentsFleetProvider(default/existing behavior),ExternalProvider(OpenRouter, Anthropic, local models via OpenAI-compatible API), and the abstractLLMProviderbase for custom implementationsSyncJudge,AsyncJudge,SyncEnv, andAsyncEnvaccept an optionalllm_providerkwarg — fully backward compatible, no changes needed for existing callersUsage
Architecture
Test plan
tests/test_llm_provider.pycovering:test_judge_criteria_markers.pytests still pass (backward compatibility)🤖 Generated with Claude Code