Moving things into LMI#446
Open
sidnarayanan wants to merge 41 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR moves LLM configuration and execution concerns “up” into LMI by introducing typed, opinionated config models (ModelSpec, LLMConfig), shifting retry/fallback behavior into LMI (with clearer error semantics), and updating LDP agents/modules/tests to consume the new config shape.
Changes:
- Replace dict-shaped
llm_modelconfiguration with typedllm_config(LLMConfig/ModelSpec) across agents, graph modules, and tests. - Introduce LMI-owned retry/fallback policy and new error surfaces (e.g.,
AllModelsExhaustedError,ModelRefusalError) with targeted unit tests. - Update VCR cassettes to reflect new default request parameters (notably
max_tokens=4096) and updated client/library versions.
Reviewed changes
Copilot reviewed 52 out of 54 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_rollouts.py | Updates fallback behavior assertions to expect AllModelsExhaustedError and uses llm_config legacy-coerce shape. |
| tests/test_optimizer.py | Migrates optimizer tests to LLMConfig.coerce() and llm_config= construction. |
| tests/test_ops.py | Switches op tests to LLMConfig inputs and updates gradient expectations when config becomes a scalar leaf. |
| tests/test_modules.py | Updates module configs/instantiation to use llm_config defaults from ReActAgent. |
| tests/test_envs.py | Updates agent construction to llm_config and error expectations to AllModelsExhaustedError. |
| tests/test_agents.py | Migrates agents to llm_config, updates serialization assertions and gradient assertions. |
| tests/cassettes/TestSimpleAgent.test_dummyenv[gpt-4o-mini-2024-07-18].yaml | VCR update for new request shape (adds max_tokens) and updated client headers. |
| tests/cassettes/TestSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens, tool schema changes). |
| tests/cassettes/TestSimpleAgent.test_agent_grad[gpt-4o-mini-2024-07-18].yaml | VCR update for new request shape (adds max_tokens) and updated client headers. |
| tests/cassettes/TestSimpleAgent.test_agent_grad[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens, tool schema changes). |
| tests/cassettes/TestReActAgent.test_react_dummyenv[True-gpt-4-turbo].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestReActAgent.test_react_dummyenv[True-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_react_dummyenv[False-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_agent_grad[True-gpt-4-turbo].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestReActAgent.test_agent_grad[True-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_agent_grad[False-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestNoToolsSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens). |
| tests/cassettes/TestMemoryAgent.test_agent_grad.yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestAgentState.test_no_state_mutation[agent1].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestAgentState.test_no_state_mutation[agent0].yaml | VCR update for new request shape (adds max_tokens). |
| src/ldp/graph/modules/thought.py | Switches module to accept LLMConfig instead of dict config. |
| src/ldp/graph/modules/reflect.py | Replaces dict llm_model field with LLMConfigField and updates wiring. |
| src/ldp/graph/modules/react.py | Uses LLMConfig.with_extra_params() to set stop sequences without mutating dicts. |
| src/ldp/graph/modules/llm_call.py | Updates parsed-call module to use LLMConfig and typed ConfigOp. |
| src/ldp/graph/common_ops.py | Changes LLMCallOp.forward() signature to accept LLMConfig and constructs LiteLLMModel via llm_config. |
| src/ldp/agent/tree_of_thoughts_agent.py | Migrates agent config field to LLMConfigField and updates call sites. |
| src/ldp/agent/simple_agent.py | Migrates agent config field to LLMConfigField and updates internal ConfigOp. |
| src/ldp/agent/react_agent.py | Migrates agent config field to LLMConfigField and updates module construction. |
| packages/lmi/tests/test_retry.py | Adds unit tests for retry/fallback classification and backoff bounds. |
| packages/lmi/tests/test_llms.py | Updates tests for new fallback error semantics, dispatch behavior, and Responses integration toggled per-model. |
| packages/lmi/tests/test_litellm_patches.py | Removes tests for the removed provider-400 retry patch. |
| packages/lmi/tests/test_dispatch.py | Adds end-to-end tests for LMI dispatch + retry/fallback loop (mocking litellm.acompletion). |
| packages/lmi/tests/test_cost_tracking.py | Simplifies tests by removing Router bypass paths and aligning to new config behavior. |
| packages/lmi/tests/test_config.py | Adds comprehensive tests for ModelSpec, legacy config translation, and LLMConfig.coerce / LLMConfigField. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_basic_call.yaml | Updates Responses API VCR cassette. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_multi_turn_stateful.yaml | Updates Responses API multi-turn VCR cassette. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_responses_api_off_ignores_response_id.yaml | Updates VCR cassette for non-Responses model behavior. |
| packages/lmi/tests/cassettes/TestLiteLLMModel.test_max_token_truncation.yaml | Adds/updates VCR cassette for truncation behavior with max_tokens. |
| packages/lmi/tests/cassettes/TestLiteLLMModel.test_cost_call_single.yaml | Adds/updates VCR cassette for streaming cost tracking request shape. |
| packages/lmi/src/lmi/retry.py | Introduces centralized retry/fallback policy and jittered exponential backoff. |
| packages/lmi/src/lmi/litellm_patches.py | Removes the provider-400 retry patch and renumbers patch docs accordingly. |
| packages/lmi/src/lmi/exceptions.py | Adds ModelRefusalError and AllModelsExhaustedError. |
| packages/lmi/src/lmi/constants.py | Removes env-flag toggle for Responses API and keeps core constants. |
| packages/lmi/src/lmi/config.py | Adds typed ModelSpec/LLMConfig and legacy/dict coercion utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jamesbraza
reviewed
Apr 23, 2026
The merge from main dropped the litellm upper pin
jamesbraza
reviewed
Jun 4, 2026
jamesbraza
approved these changes
Jun 5, 2026
jamesbraza
left a comment
Member
There was a problem hiding this comment.
Huge refactor, wow. Nice work! Looking forward to using LLMConfig
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bringing some functionality up from LiteLLM into LMI:
ModelSpec, LLMConfig), which get translated into litellm config dicts just in timeModelSpec.responses_apilets us toggle-on Responses backend selectively. This means we can swap between the backends in a fallback cascadeNotably, this drops all dependence on
litellm.Routerexcept for embeddings, which we can likely handle in a similar way. It also gets rid of a few things that annoyed me:model_listor top-level kwargsSimpleAgent/LLMCallOp/LiteLLMModel/litellmlevels. Now, the first 3 all operate onLLMConfig.Some proof that this is backwards compatible: no cassette churn besides from new/dropped tests.