ENH model capability flags + per-capability deactivation (covariate lift) by GeoffNN · Pull Request #38 · benchopt/benchmark_tsfm

GeoffNN · 2026-05-29T15:13:35Z

What

Flag each forecasting model with the capabilities it supports — multivariate, hist_covariates, future_covariates — and let a user deactivate each covariate capability per run, so the lift from each can be benchmarked.

univariate is deliberately not a flag: it's the floor every model gets. A model that declares (or has enabled) none of the covariate capabilities runs univariate.

How

benchmark_utils/capabilities.py (new) — flag vocabulary + mask_covariates(covariates, active) helper. Masking only ever drops covariates; targets are never touched.
BaseTSFMAdapter.covariate_capabilities — the effective active set for a run. Defaults to empty ⇒ safe-by-default (a new adapter sees no covariates / runs univariate).
Objective._eval_forecasting masks the covariate payload down to the adapter's covariate_capabilities before predict() — a single, guaranteed enforcement point for every forecasting model, present and future.
Solvers declare capabilities (metadata): Chronos/Naive/SeasonalNaive {}, Chronos2/Toto2 {multivariate}, TFC-API {multivariate, hist_covariates, future_covariates}.
TFC-API exposes sweepable toggles use_hist_covars / use_future_covars (default True) and now actually threads covariates to the SDK via historical_variables / future_variables (columns attached in both the batched and per-series paths).

Lift sweep

benchopt run -s 'TFC-API[use_future_covars=[True,False]]'

⇒ two rows; the metric delta is the future-covariate lift.

Design note

multivariate is declarative-only metadata: targets are always passed whole (no channel-splitting), so there's no behavioural toggle for it yet — it exists to describe the model until a multivariate-target dataset and the matching masking land. static_covars is likewise out of scope (no dataset uses it).

Tests

tests/benchmark_utils/test_capabilities.py — mask_covariates drops the right fields per active set; static preserved; input not mutated.
tests/test_objective_capability_masking.py — recording adapter confirms the objective masks per covariate_capabilities before predict.
tests/solvers/test_tfc_api_covariates.py — mocked cross_validate asserts *_variables + columns present when covariates supplied, None/absent when deactivated. No network / API key.

12 passed. Independent of the leakage (#24) and Enedis (#26) PRs — built on the covariate plumbing already in main.

🤖 Generated with Claude Code

Declare each forecasting model's capabilities (multivariate, hist/future covariates) and let users deactivate covariate capabilities per run to benchmark the lift each provides. - benchmark_utils/capabilities.py: flag vocabulary + mask_covariates helper - BaseTSFMAdapter.covariate_capabilities: effective active set (default empty => univariate, safe by default) - Objective._eval_forecasting masks the covariate payload to the adapter's capabilities before predict() -- single, guaranteed enforcement point - Solvers declare `capabilities`; TFC-API exposes use_hist_covars / use_future_covars toggles and now threads covariates to the SDK via historical_variables / future_variables (multivariate stays declarative) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH model capability flags + per-capability deactivation (covariate lift)#38

ENH model capability flags + per-capability deactivation (covariate lift)#38
GeoffNN wants to merge 1 commit into
benchopt:mainfrom
GeoffNN:feat/model-capability-flags

GeoffNN commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GeoffNN commented May 29, 2026

What

How

Lift sweep

Design note

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant