Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 8 additions & 16 deletions skills/cuopt-numerical-optimization-api-c/BENCHMARK.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the s
## Evaluation Summary

- Skill: `cuopt-numerical-optimization-api-c`
- Evaluation date: 2026-06-10
- Evaluation date: 2026-06-26
- NVSkills-Eval profile: `external`
- Environment: `astra-sandbox`
- Dataset: 4 evaluation tasks
Expand Down Expand Up @@ -55,33 +55,25 @@ Task composition is derived from the evaluation dataset when possible. Entries w
| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 4 | 100% (+0%) | 100% (+0%) |
| Correctness | 4 | 88% (+16%) | 72% (+16%) |
| Discoverability | 4 | 68% (+46%) | 55% (+36%) |
| Effectiveness | 4 | 92% (+7%) | 70% (+17%) |
| Efficiency | 4 | 66% (+48%) | 62% (+35%) |
| Correctness | 4 | 88% (+31%) | 82% (+23%) |
| Discoverability | 4 | 70% (+52%) | 63% (+45%) |
| Effectiveness | 4 | 94% (+35%) | 94% (+33%) |
| Efficiency | 4 | 69% (+44%) | 69% (+42%) |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 7 total findings.
Tier 1 validation passed with observations. NVSkills-Eval ran 1 checks and found 2 total findings.

Top findings:

- MEDIUM QUALITY/quality_efficiency: Deeply nested references in examples.md (`skills/cuopt-numerical-optimization-api-c/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/cuopt-numerical-optimization-api-c/SKILL.md`)
- LOW QUALITY/quality_discoverability: No '## Purpose' section (`skills/cuopt-numerical-optimization-api-c/SKILL.md`)
- LOW QUALITY/quality_reliability: No prerequisites/requirements documented (`skills/cuopt-numerical-optimization-api-c/SKILL.md`)
- LOW QUALITY/quality_reliability: No limitations documented (`skills/cuopt-numerical-optimization-api-c/SKILL.md`)
- LOW SCHEMA/author_format: Author must be of the form 'Name <email@host>' (`skills/cuopt-numerical-optimization-api-c/SKILL.md`)

## Tier 2: Deduplication Summary

Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.

Notable observations:

- Context Deduplication: Collected 9 file(s)
- Inter-Skill Deduplication: Parsed skill 'cuopt-numerical-optimization-api-c': 105 char description
This tier was not run or did not produce findings in this report.

## Publication Recommendation

Expand Down
1 change: 1 addition & 0 deletions skills/cuopt-numerical-optimization-api-c/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ metadata:
---



# cuOpt Numerical Optimization — C API

Solve LP, MILP, and QP problems via the cuOpt C API. The same library, headers, build pattern, and core calls (`cuOptCreate*Problem`, `cuOptSolve`, `cuOptGetObjectiveValue`) apply across all three; QP extends the API with quadratic-objective creation calls.
Expand Down
24 changes: 15 additions & 9 deletions skills/cuopt-numerical-optimization-api-c/skill-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,32 @@ This skill is ready for commercial/non-commercial use. <br>
NVIDIA <br>

### License/Terms of Use: <br>
Apache 2.0 <br>
Apache-2.0 <br>
## Use Case: <br>
Developers and engineers embedding LP, MILP, or QP numerical optimization into C/C++ applications using the NVIDIA cuOpt GPU-accelerated solver. <br>
Developers and engineers embedding LP, MILP, or QP optimization in C/C++ applications using the cuOpt GPU-accelerated solver. <br>

### Deployment Geography for Use: <br>
Global <br>

## Requirements / Dependencies: <br>
**Requires API Key or External Credential:** [No] <br>
**Credential Type(s):** [None] <br>

Do not include secrets in prompts/logs/output; use least-privilege credentials; rotate keys as appropriate. <br>

## Known Risks and Mitigations: <br>
Risk: Review before execution as proposals could introduce incorrect or misleading guidance into skills. <br>
Mitigation: Review and scan skill before deployment. <br>

## Reference(s): <br>
- [C API Examples (LP/MILP)](references/examples.md) <br>
- [Assets — Reference C Examples](assets/README.md) <br>
- [cuOpt User Guide](https://docs.nvidia.com/cuopt/user-guide/latest/introduction.html) <br>
- [cuOpt Examples Repository](https://github.com/NVIDIA/cuopt-examples) <br>


## Skill Output: <br>
**Output Type(s):** [Code, Shell commands] <br>
**Output Format:** [Markdown with inline C code blocks] <br>
**Output Format:** [C source code with inline shell build commands] <br>
**Output Parameters:** [1D] <br>
**Other Properties Related to Output:** [None] <br>

Expand All @@ -37,7 +43,7 @@ Mitigation: Review and scan skill before deployment. <br>


## Evaluation Tasks: <br>
Evaluated against 4 internal evaluation tasks (positive skill-activation cases) via NVSkills-Eval with the external profile. <br>
Evaluated against 4 internal evaluation tasks (NVSkills-Eval, external profile, astra-sandbox environment). <br>

## Evaluation Metrics Used: <br>
Reported benchmark dimensions: <br>
Expand All @@ -62,10 +68,10 @@ Underlying evaluation signals used in this run: <br>
| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 4 | 100% (+0%) | 100% (+0%) |
| Correctness | 4 | 88% (+16%) | 72% (+16%) |
| Discoverability | 4 | 68% (+46%) | 55% (+36%) |
| Effectiveness | 4 | 92% (+7%) | 70% (+17%) |
| Efficiency | 4 | 66% (+48%) | 62% (+35%) |
| Correctness | 4 | 88% (+31%) | 82% (+23%) |
| Discoverability | 4 | 70% (+52%) | 63% (+45%) |
| Effectiveness | 4 | 94% (+35%) | 94% (+33%) |
| Efficiency | 4 | 69% (+44%) | 69% (+42%) |

## Skill Version(s): <br>
26.08.00 (source: frontmatter) <br>
Expand Down
Loading
Loading