feat: adds ability to optimize for cost by andrewklatzke · Pull Request #172 · launchdarkly/python-server-sdk-ai

andrewklatzke · 2026-05-06T23:10:52Z

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Describe the solution you've provided

Implements cost optimization in the same manner as latency optimization. Searches the acceptance statement for keywords pertaining to token usage/cost (e.g. costs, pricing, bill) and adds instructions to the variation generation to try to optimize for costs. Additionally has the acceptance statement prompt return instructions for the variation generation (ie, cheaper model, etc).

Describe alternatives you've considered

This is a feature addition.

Additional context

We'll be adding UI options for both latency and cost with adjustable thresholds, but these are still valid once those arrive since a mention of cost/latency means the user is trying to optimize for it.

Note

Medium Risk
Changes core optimization loop behavior (history management, pass/fail gating, and token-limit handling) and modifies result payloads sent to the LD API, which could affect run outcomes and reporting. Scoped to optimization tooling with broad test coverage, but touches multiple execution paths (standard, validation, and ground-truth).

Overview
Adds cost-aware optimization driven by acceptance-statement keyword detection, including a new cost section in variation prompts and extra judge guidance that references token usage and estimated USD cost.

Introduces cost estimation (estimate_cost) using model pricing from pre-fetched model configs, stores per-iteration estimated_cost_usd, and enforces a new _cost_gate (10% improvement by default) similar to the existing latency gate; both gates are now recorded as synthetic score entries for visibility.

Improves run telemetry and correctness: tracks accumulated_token_usage across agent/judge/variation calls, treats token_limit=0 as “no limit”, reorders token-limit checks to occur after scoring (including GT batch edge cases), caps history growth via _trim_history, and refines provider-prefix stripping to avoid breaking Bedrock region-style model IDs. API PATCH payloads now include generationTokens.accumulated_total when available.

^{Reviewed by Cursor Bugbot for commit f2f0894. Bugbot is set up for automated code reviews on this repo. Configure here.}

**Requirements** - [x] I have added test coverage for new or changed functionality - [x] I have followed the repository's [pull request submission guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests) - [x] I have validated my changes against all supported platform versions **Describe the solution you've provided** This is intended to demystify some of the results we're receiving from the optimization package - namely: - Total token counts are now accrued and reported with each result so that we can see if a user crosses the total allowed tokens threshold - Score results are reported for cost or latency if they're being optimized against as an item in the `score` result so that it can be shown on the UI - Finally, if quality has already met the required threshold the prompt now contains instructions to optimize only against cost (if cost is being optimized against) **Describe alternatives you've considered** This is in some ways a bug fix since this information wasn't clear to the user as to what was causing the failure. Technically additional feature/functionality but likely required to express the required information to make it actionable for the user. **Additional context** Cost and latency are only optimized for/include scores if they trigger the keywords that would lead to them being optimized. "Base" implementations without these features being used are unaffected.  --- > [!NOTE] > **Medium Risk** > Changes optimization pass/fail logic and persisted result payloads (new gate scores, baseline handling, token-budget semantics), which could affect when runs succeed/fail and what the UI/API receives. > > **Overview** > Improves optimization run reporting by tracking and persisting a single `accumulated_token_usage` total across agent, judge, and variation calls, and including it in result PATCH payloads (extending `generationTokens` to allow `accumulated_total`). > > Refactors latency/cost optimization to use explicit baseline values (not `history[0]`), caps history growth (`_trim_history`) for both standard and ground-truth flows, and adds synthetic `_latency_gate`/`_cost_gate` score entries so gate failures are visible in results. > > Adjusts run control flow so pass/fail is evaluated before token-limit checks (including GT batches and validation), and updates variation prompting to focus purely on cost reduction when quality is already passing; also relaxes the cost gate tolerance from 20% to 10% improvement and expands tests accordingly. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 365fa94. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>

cursor · 2026-05-12T21:28:35Z

+                        f"The agent's response used {agent_usage.input} input tokens "
+                        f"and {agent_usage.output} output tokens "
+                        f"(estimated cost: ${current_cost:.6f}). "
+                    )


Token count f-string may print None values

Low Severity

When agent_usage.input or agent_usage.output is None (which TokenUsage allows — estimate_cost explicitly guards against this), the f-string at this location would produce text like "used None input tokens and 40 output tokens" in the judge instructions. This happens because estimate_cost can return a non-None cost using only the non-None token count, but the f-string unconditionally formats both fields.

^{Reviewed by Cursor Bugbot for commit d267832. Configure here.}

cursor

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f2f0894. Configure here.}

cursor · 2026-05-14T18:44:32Z

        self._last_succeeded_context = None
        self._last_optimization_result_id = None
        self._total_token_usage = 0
+        self._last_batch_size = 1


_model_configs not set in optimize_from_ground_truth_options path

Medium Severity

The _model_configs attribute is only populated in optimize_from_config (line 1626) but never in the optimize_from_ground_truth_options or optimize_from_options entry points. Both _run_gt_optimization and _run_optimization reset state but don't set _model_configs, so it stays as the empty list from __init__. This means _find_model_config always returns None, estimate_cost always returns None, and the cost gate silently passes without any actual cost comparison — making cost optimization ineffective for these paths.

Additional Locations (1)

packages/optimization/src/ldai_optimizer/client.py#L2602-L2610

^{Reviewed by Cursor Bugbot for commit f2f0894. Configure here.}

feat: adds ability to optimize for cost

94de596

andrewklatzke requested a review from a team as a code owner May 6, 2026 23:10

cursor Bot reviewed May 6, 2026

View reviewed changes

Comment thread packages/optimization/src/ldai_optimizer/client.py

Comment thread packages/optimization/src/ldai_optimizer/prompts.py

Comment thread packages/optimization/src/ldai_optimizer/prompts.py

andrewklatzke added 2 commits May 6, 2026 15:23

fix: remove unnecessary token path

dc82818

feat: adds reporting for cost and latency optimization failures

365fa94

andrewklatzke requested a review from jsonbailey May 7, 2026 22:03

jsonbailey approved these changes May 12, 2026

View reviewed changes

cursor Bot reviewed May 12, 2026

View reviewed changes

andrewklatzke added 2 commits May 13, 2026 12:05

fix: don't only evaluate final input in GT results

9bedf9e

fix: don't only evaluate final input in GT results

53f455f

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread packages/optimization/src/ldai_optimizer/util.py Outdated

Comment thread packages/optimization/src/ldai_optimizer/client.py Outdated

andrewklatzke added 2 commits May 13, 2026 12:45

fix: only strip known provider prefixes

66bc1f0

fix: address cursor feedback

f2f0894

cursor Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds ability to optimize for cost#172

feat: adds ability to optimize for cost#172
andrewklatzke wants to merge 8 commits into
aklatzke/AIC-2263/sdk-dx-improvementsfrom
aklatzke/AIC-2465/cost-optimization

andrewklatzke commented May 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 12, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrewklatzke commented May 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 12, 2026

Choose a reason for hiding this comment

Token count f-string may print None values

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

_model_configs not set in optimize_from_ground_truth_options path

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewklatzke commented May 6, 2026 •

edited by cursor Bot

Loading

`_model_configs` not set in `optimize_from_ground_truth_options` path