Skip to content

ENH model capability flags + per-capability deactivation (covariate lift)#38

Open
GeoffNN wants to merge 1 commit into
benchopt:mainfrom
GeoffNN:feat/model-capability-flags
Open

ENH model capability flags + per-capability deactivation (covariate lift)#38
GeoffNN wants to merge 1 commit into
benchopt:mainfrom
GeoffNN:feat/model-capability-flags

Conversation

@GeoffNN
Copy link
Copy Markdown
Contributor

@GeoffNN GeoffNN commented May 29, 2026

What

Flag each forecasting model with the capabilities it supports — multivariate, hist_covariates, future_covariates — and let a user deactivate each covariate capability per run, so the lift from each can be benchmarked.

univariate is deliberately not a flag: it's the floor every model gets. A model that declares (or has enabled) none of the covariate capabilities runs univariate.

How

  • benchmark_utils/capabilities.py (new) — flag vocabulary + mask_covariates(covariates, active) helper. Masking only ever drops covariates; targets are never touched.
  • BaseTSFMAdapter.covariate_capabilities — the effective active set for a run. Defaults to empty ⇒ safe-by-default (a new adapter sees no covariates / runs univariate).
  • Objective._eval_forecasting masks the covariate payload down to the adapter's covariate_capabilities before predict() — a single, guaranteed enforcement point for every forecasting model, present and future.
  • Solvers declare capabilities (metadata): Chronos/Naive/SeasonalNaive {}, Chronos2/Toto2 {multivariate}, TFC-API {multivariate, hist_covariates, future_covariates}.
  • TFC-API exposes sweepable toggles use_hist_covars / use_future_covars (default True) and now actually threads covariates to the SDK via historical_variables / future_variables (columns attached in both the batched and per-series paths).

Lift sweep

benchopt run -s 'TFC-API[use_future_covars=[True,False]]'

⇒ two rows; the metric delta is the future-covariate lift.

Design note

multivariate is declarative-only metadata: targets are always passed whole (no channel-splitting), so there's no behavioural toggle for it yet — it exists to describe the model until a multivariate-target dataset and the matching masking land. static_covars is likewise out of scope (no dataset uses it).

Tests

  • tests/benchmark_utils/test_capabilities.pymask_covariates drops the right fields per active set; static preserved; input not mutated.
  • tests/test_objective_capability_masking.py — recording adapter confirms the objective masks per covariate_capabilities before predict.
  • tests/solvers/test_tfc_api_covariates.py — mocked cross_validate asserts *_variables + columns present when covariates supplied, None/absent when deactivated. No network / API key.

12 passed. Independent of the leakage (#24) and Enedis (#26) PRs — built on the covariate plumbing already in main.

🤖 Generated with Claude Code

Declare each forecasting model's capabilities (multivariate, hist/future
covariates) and let users deactivate covariate capabilities per run to
benchmark the lift each provides.

- benchmark_utils/capabilities.py: flag vocabulary + mask_covariates helper
- BaseTSFMAdapter.covariate_capabilities: effective active set (default empty
  => univariate, safe by default)
- Objective._eval_forecasting masks the covariate payload to the adapter's
  capabilities before predict() -- single, guaranteed enforcement point
- Solvers declare `capabilities`; TFC-API exposes use_hist_covars /
  use_future_covars toggles and now threads covariates to the SDK via
  historical_variables / future_variables (multivariate stays declarative)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant