cuopt-agent: multi-objective supply-vs-cost what-if + cost-cap eval#157
Open
cafzal wants to merge 2 commits into
Open
cuopt-agent: multi-objective supply-vs-cost what-if + cost-cap eval#157cafzal wants to merge 2 commits into
cafzal wants to merge 2 commits into
Conversation
…al case Signed-off-by: cafzal <cameron.afzal@gmail.com>
This was referenced Jun 18, 2026
Contributor
Author
|
@ramakrishnap-nv cuopt-agent what-if + eval – activates the multi-objective skill in the agent (supply-vs-cost, no agreed weighting). GPU-validated; before/after in the description. Ready when you have a cycle. |
rapids-bot Bot
pushed a commit
to NVIDIA/cuopt
that referenced
this pull request
Jun 24, 2026
Adds a fifth eval to the `cuopt-multi-objective-exploration` skill — `multiobj-explore-eval-005-latent-objective` — covering the boundary the existing four don't: a problem stated with a **single** objective while a **second objective sits latent in the data**, unstated. The current evals all hand the agent both objectives (001 interpret, 002 explore, 004 dual-as-slope) or are explicitly single-objective (003 decoy). None test recognizing a *latent* objective. This one grades whether the skill makes the agent surface the latent cost objective and trace the supply-vs-cost frontier — rather than optimizing the stated objective alone or silently folding cost into a self-chosen weighted blend (`maximize supply − λ·cost`). It brackets the skill's activation boundary opposite the 003 decoy. Behavioral eval (`expected_script: null`, LLM-graded on the behavior list), same house style as 001/002/004; `validate_skills.sh` picks up the new array entry and the signature / `BENCHMARK.md` / skill-card regenerate via NVSkills-Eval. The latent-objective shape is the max-supply supply-vs-cost case validated on cuOpt (Tesla T4) in NVIDIA/cuopt-examples#157. Authors: - Cameron Afzal (https://github.com/cafzal) Approvers: - Ramakrishna Prabhu (https://github.com/ramakrishnap-nv) URL: #1442
…olve Independent re-solve (CBC, gap 0) shows the cost-capped optimum is 2,670,000 (267 FG1 units at total cost exactly 9,149.80; LP bound 2,676,052), not 2,660,000 — the recorded value sits inside model.py's default 1% MIP gap. Update the expected answer and have the question demand a zero-gap solve so the eval is deterministic. Also note the budget sweep samples a step function rather than a full curve.
a4484fa to
9be7302
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A new what-if scenario (
scenario_4.md) and eval case (max_supply_4) for the cuopt-agent's max-supply model — bringing multi-objective tradeoff exploration to the agent.Why
The agent ships a multi-period MILP whose cost data (
item_costs.csv,resource_costs.csv) is unused, and it only ever runs single-objective; thecuopt-multi-objective-explorationskill (NVIDIA/cuopt#1355) is available but no scenario exercises it.scenario_4activates it as a supply-vs-cost tradeoff with no agreed weighting, framed to test judgment rather than prescribe the method (it surfaces the MILP-has-no-duals and 10000:1-weight traps without naming ε-constraint).max_supply_4adds a numeric check graded by the existingcuopt_objectiveevaluator: cap total cost at 9,149.8 and maximize supply, ground truth 2,660,000, computed on cuOpt (Tesla T4).User testing
Run on cuOpt (Tesla T4):
maximize supply − λ·cost); withscenario_4+ the skill it traces the frontier by ε-constraint, differences adjacent points for the rate (correctly noting a MILP has no duals), reports interpretable units, flags the knee, and leaves the pick to finance — and its solve hit the eval ground truth.Reference frontier — max weighted supply vs. cost cap (cuOpt, Tesla T4); unconstrained max 3,450,061 at cost 15,249.6
Toy sample data — exercises the multi-objective method on the agent, not a planning study.