Skip to content

cuopt-agent: multi-objective supply-vs-cost what-if + cost-cap eval#157

Open
cafzal wants to merge 2 commits into
NVIDIA:mainfrom
cafzal:agent-cost-tradeoff
Open

cuopt-agent: multi-objective supply-vs-cost what-if + cost-cap eval#157
cafzal wants to merge 2 commits into
NVIDIA:mainfrom
cafzal:agent-cost-tradeoff

Conversation

@cafzal

@cafzal cafzal commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

A new what-if scenario (scenario_4.md) and eval case (max_supply_4) for the cuopt-agent's max-supply model — bringing multi-objective tradeoff exploration to the agent.

Why

The agent ships a multi-period MILP whose cost data (item_costs.csv, resource_costs.csv) is unused, and it only ever runs single-objective; the cuopt-multi-objective-exploration skill (NVIDIA/cuopt#1355) is available but no scenario exercises it. scenario_4 activates it as a supply-vs-cost tradeoff with no agreed weighting, framed to test judgment rather than prescribe the method (it surfaces the MILP-has-no-duals and 10000:1-weight traps without naming ε-constraint). max_supply_4 adds a numeric check graded by the existing cuopt_objective evaluator: cap total cost at 9,149.8 and maximize supply, ground truth 2,660,000, computed on cuOpt (Tesla T4).

User testing

Run on cuOpt (Tesla T4):

  • The reference solve traces a well-posed supply-vs-cost frontier — supply buys in at ~288 weighted-units/$, then collapses to ~7/$ past the FG1-saturation knee (full sweep below).
  • Before/after, same LLM: without the skill the agent collapses to a self-weighted blend (maximize supply − λ·cost); with scenario_4 + the skill it traces the frontier by ε-constraint, differences adjacent points for the rate (correctly noting a MILP has no duals), reports interpretable units, flags the knee, and leaves the pick to finance — and its solve hit the eval ground truth.
Reference frontier — max weighted supply vs. cost cap (cuOpt, Tesla T4); unconstrained max 3,450,061 at cost 15,249.6
total cost ≤ C max weighted obj FG1 FG2 Δobj/Δ$
3,812 1,140,000 114 0
5,168 1,530,000 153 0 288
6,523 1,920,000 192 0 288
7,879 2,300,000 230 0 280
9,235 2,690,000 269 0 288
10,590 3,020,000 302 0 243
11,946 3,300,001 330 1 207
13,301 3,440,006 344 6 103
14,657 3,450,063 345 63 7
16,012 3,460,062 346 62 7

Toy sample data — exercises the multi-objective method on the agent, not a planning study.

…al case

Signed-off-by: cafzal <cameron.afzal@gmail.com>
@cafzal

cafzal commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@ramakrishnap-nv cuopt-agent what-if + eval – activates the multi-objective skill in the agent (supply-vs-cost, no agreed weighting). GPU-validated; before/after in the description. Ready when you have a cycle.

rapids-bot Bot pushed a commit to NVIDIA/cuopt that referenced this pull request Jun 24, 2026
Adds a fifth eval to the `cuopt-multi-objective-exploration` skill — `multiobj-explore-eval-005-latent-objective` — covering the boundary the existing four don't: a problem stated with a **single** objective while a **second objective sits latent in the data**, unstated.

The current evals all hand the agent both objectives (001 interpret, 002 explore, 004 dual-as-slope) or are explicitly single-objective (003 decoy). None test recognizing a *latent* objective. This one grades whether the skill makes the agent surface the latent cost objective and trace the supply-vs-cost frontier — rather than optimizing the stated objective alone or silently folding cost into a self-chosen weighted blend (`maximize supply − λ·cost`). It brackets the skill's activation boundary opposite the 003 decoy.

Behavioral eval (`expected_script: null`, LLM-graded on the behavior list), same house style as 001/002/004; `validate_skills.sh` picks up the new array entry and the signature / `BENCHMARK.md` / skill-card regenerate via NVSkills-Eval. The latent-objective shape is the max-supply supply-vs-cost case validated on cuOpt (Tesla T4) in NVIDIA/cuopt-examples#157.

Authors:
  - Cameron Afzal (https://github.com/cafzal)

Approvers:
  - Ramakrishna Prabhu (https://github.com/ramakrishnap-nv)

URL: #1442
…olve

Independent re-solve (CBC, gap 0) shows the cost-capped optimum is
2,670,000 (267 FG1 units at total cost exactly 9,149.80; LP bound
2,676,052), not 2,660,000 — the recorded value sits inside model.py's
default 1% MIP gap. Update the expected answer and have the question
demand a zero-gap solve so the eval is deterministic. Also note the
budget sweep samples a step function rather than a full curve.
@cafzal cafzal force-pushed the agent-cost-tradeoff branch from a4484fa to 9be7302 Compare July 2, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants