Skip to content

Merge upstream master into GhentAnalysis/master (0.3 release cycle)#121

Open
JulesVandenbroeck wants to merge 308 commits into
GhentAnalysis/masterfrom
GhentAnalysis/upstream_merge_with_master
Open

Merge upstream master into GhentAnalysis/master (0.3 release cycle)#121
JulesVandenbroeck wants to merge 308 commits into
GhentAnalysis/masterfrom
GhentAnalysis/upstream_merge_with_master

Conversation

@JulesVandenbroeck

@JulesVandenbroeck JulesVandenbroeck commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Overview

This PR tracks all changes in columnflow/columnflow:master
that are not yet present in GhentAnalysis/master. It is a documentation-only / dummy PR
its purpose is to give an overview of what an eventual merge from upstream would bring in.

The base branch (GhentAnalysis/master_merge_from_upstream) is identical to GhentAnalysis/master
at the time of this PR's creation.


Migration Guide — Required Changes for Existing Analyses

The following changes are required in any existing analysis to run with the updated columnflow after this PR. All other changes listed above are additive and backwards-compatible.


1. Update patch_bundle_repo_exclude_files in columnflow_patches.py

Why it breaks: BundleRepo.exclude_files now stores absolute paths (built internally via _cf_path() / _repo_path() helpers) instead of relative paths. The old patch re-prefixed every existing entry with the relative path from the analysis base to the CF base — this now produces invalid doubled paths and breaks remote job bundling.

Replace the old function body:

# OLD — breaks with the updated columnflow
def patch_bundle_repo_exclude_files():
    from columnflow.tasks.framework.remote import BundleRepo

    cf_rel = os.path.relpath(os.environ["CF_BASE"], os.environ["MYANALYSIS_BASE"])
    exclude_files = [os.path.join(cf_rel, path) for path in BundleRepo.exclude_files]
    exclude_files.extend([
        "docs", "tests", "data", "assets", ".law", ".setups", ".data", ".github",
    ])
    BundleRepo.exclude_files[:] = exclude_files

    logger.debug("patched exclude_files of cf.BundleRepo")

with:

# NEW — only append analysis-specific extra exclusions
def patch_bundle_repo_exclude_files():
    from columnflow.tasks.framework.remote import BundleRepo

    # add additional files to exclude
    BundleRepo.exclude_files += ["docs"]

    logger.debug("patched exclude_files of cf.BundleRepo")

The standard exclusions (tests, data, .setups, .data, .github, etc.) are now included for both the CF repo and the analysis repo by default. Only append paths that are specific to your analysis and not already covered. Also remove import os at the top of columnflow_patches.py if it is no longer used elsewhere.


2. Add **kwargs to all TAF hook function signatures

Why it breaks: The framework now passes keyword arguments when invoking TAF hook functions. Any hook function without **kwargs that calls super() — or is used as a base via bases= — will raise a TypeError.

# OLD — missing **kwargs, breaks when super() is involved
@my_producer.init
def my_producer_init(self: Producer) -> None:
    ...

@my_producer.requires
def my_producer_requires(self: Producer, task, reqs) -> None:
    ...

# NEW — add **kwargs to every hook signature
@my_producer.init
def my_producer_init(self: Producer, **kwargs) -> None:
    ...

@my_producer.requires
def my_producer_requires(self: Producer, task, reqs, **kwargs) -> None:
    ...

Apply to all @my_taf.init, @my_taf.requires, @my_taf.setup, and @my_taf.teardown functions across calibrators, selectors, and producers.


3. Call super() in TAF hooks when using bases=

Why it breaks: If an analysis TAF uses bases=(some_base_taf,) and defines its own hook functions, it must explicitly call the base TAF's hook at the start of each overridden hook. Without the call, the base TAF's initialization, requirement registration, and setup logic (e.g. external file loading, normalization weight table building) is silently skipped.

@my_producer.init
def my_producer_init(self: Producer, **kwargs) -> None:
    super(my_producer, self).init_func(**kwargs)
    # ... analysis-specific init ...

@my_producer.requires
def my_producer_requires(self: Producer, task, reqs, **kwargs) -> None:
    super(my_producer, self).requires_func(task=task, reqs=reqs, **kwargs)
    # ... analysis-specific requirements ...

@my_producer.setup
def my_producer_setup(self, task, reqs, inputs, reader_targets, **kwargs) -> None:
    super(my_producer, self).setup_func(
        task=task, reqs=reqs, inputs=inputs, reader_targets=reader_targets, **kwargs,
    )
    # ... analysis-specific setup ...

This applies to calibrators, selectors, and producers that extend a built-in columnflow TAF via bases=.

4. Small changes

  • rucio_report_access boolean should be added to law.cfg for all fs

New Features

Framework & Core

Producers & Calibrators

Selectors & Categorization

Histogramming & Plotting

Inference / Datacards

Tasks & Infrastructure


Bug Fixes

Histograms & Plots

  • Hotfix rewriting of outputs in MergeHistograms
  • Hotfix shift id-to-name conversion in histogram post-processing
  • Hotfix empty histogram issue in inference base task (two separate fixes)
  • Hotfix nominal shift in PlotShiftedVariablesPerShift1D
  • Hotfix hist validation in datacard writer
  • Hotfix sci-style axis label notation in plots / fix horizontal y-offset label position
  • Hotfix single shift selection in plotting
  • Hotfix save_div in plot scale factor
  • Hotfix multi-axis filling with discrete variables
  • Bug fix in fill_hist function (bug fix in fill_hist function columnflow/columnflow#768)
  • Fix variance of fake data in datacard writer
  • Fix plotting process shift map (Fix/plotting process shift map columnflow/columnflow#760)
  • Hotfix bad import in plot utils
  • Avoid using flat_np_view for value assignment (Avoid using flat_np_view for value assignment. columnflow/columnflow#759)
  • Use safe concatenation (Use safe concatenation. columnflow/columnflow#758)
  • Hotfix multi-config lookup via patterns
  • Hotfix process object selection for multi-config datacards
  • Hotfix variable shape type in combine datacard writer
  • Fix skipping data in CreateDatacards
  • Hotfix datacard writing when variables are missing in config
  • Hotfix parameter group cleaning in inference model

Producers & Calibrators

  • Hotfix kwargs for array function super calls
  • Hotfix super() calls in all task array functions
  • Hotfix TAF class attribute inheritance and TAF instance method defaults
  • Hotfix calibrator/producer requirement handling
  • Hotfix btag_wp_weights producer: clamp pt and eta efficiency map axes
  • Hotfix ExternalFile dataclass: drop None values
  • Hotfix sorted_ak_to_root helper for nested arrays
  • Hotfix CMS jet veto map usage; correct bit masks (correct bit masks jet veto map columnflow/columnflow#778)
  • Hotfix CMS muon calibrator: remove outdated rnd_gen argument
  • Hotfix abs eta in CMS muon weight producer
  • Hotfix CMS tau energy scale variations
  • [cms] Hotfix varied energy errors in EGM calibration
  • [cms] Hotfix egamma calibrator: use same random numbers for all smearing variations
  • [cms] Hotfix tau energy calibration: skip e-fake mask
  • Hotfix TEC calibrator: add back charge column
  • Hotfix inclusive dataset attribute and lookup in norm weight producer
  • Hotfix norm weight logging
  • Hotfix combined jets calibrator
  • Hotfix validation check in stitched normalization weight production
  • Hotfix missing xsecs for stitched weight producer
  • Hotfix nbtags variable in DY weight producer
  • Fix handling of non_zero_mask in murf_envelope (fix handling of non_zero_mask in murf_envelope columnflow/columnflow#704)
  • Fix JER application on JEC variations (Fix to jer application on jec variations columnflow/columnflow#665)
  • Add exception when era aux is missing (Fix/jec 2023 columnflow/columnflow#700)
  • Fix scope issue in seed producer
  • Hotfix depth limit of gen particles
  • Hotfix saving of columns in gen_particle lookups
  • Hotfix higgs gen lookup: consider effective gluon/photon decays
  • Hotfix typo in gen_top lookup
  • Skip string columns in finiteness checks (Producing string columns columnflow/columnflow#743)

Tasks & Workflow

  • Hotfix task_namespace of Wrapper tasks
  • Hotfix producer group resolution in ProduceColumnsWrapper; fix brace expansion
  • Hotfix config object resolution and order (deterministic and group lookup)
  • Hotfix process selection when only datasets are given
  • Hotfix default version injection into tasks with same family
  • Hotfix version resolution, pinning, and lookup
  • Hotfix forwarding of known_shifts for instance caching
  • Hotfix required producers/calibrators for workflows
  • Hotfix attributes added by TAF decorators
  • Hotfix: allow brace patterns in TAF shifts
  • Fix duplicate producers and requirements
  • Fix workflow parameter passthrough in pilot mode (Fix workflow parameter passthrough in pilot mode. columnflow/columnflow#785)
  • Fix bundle unpacking (Fix ExternalFiles bundle unpacking. columnflow/columnflow#786)
  • Fix total size logged by MergeReductionStats
  • Fix reduction chunk size control
  • Fix task key lookup (Fix task pinning columnflow/columnflow#697)
  • Hotfix requirement order for consistent output removal
  • Hotfix consistent branch reqs between MergeReducedEvents and MergeSelectionStats
  • Raise explicit error in reduction on option type masks

IO & Environment

  • Hotfix ChunkedParquetReader
  • Multiple fixes regarding empty files or almost-empty chunks (mutliple fixes regarding empty files or (almost) empty chunks columnflow/columnflow#750)
  • Hotfix reduction to skip empty chunks
  • Fix mamba setup
  • Hotfix htcondor/conda software setup in remote jobs
  • Fix missing datasets in MultiConfig
  • Hotfix CAT metadata update check for missing POG dirs
  • Hotfix category flattening
  • Correct JSON file extension for stats file
  • Do not consider empty axis as "missing" (histogram axis fix)
  • Fix opening root files in cf_inspect / after coffea update (Fix cf_inspect script after coffea update. columnflow/columnflow#753)
  • Log broken parquet file paths
  • Fix warning message in BundleExternalFiles task
  • Hotfix repo bundling: add missing user config
  • Failsafe setting of htcondor requested memory
  • Apply blinding threshold before process scaling
  • Hotfix electron weight producer with nested working points

Small Changes

Dependency & Submodule Updates

  • Multiple law, order, scinum, and boost-histogram submodule/version updates
  • Update default CMSSW versions in CMSSW sandboxes

Code Quality & Cleanup

  • Multiple typo fixes across the codebase
  • Minor code cleanups and consistency changes
  • Cleanup of e/mu id producers; remove unneeded columns in CMS TEC calibrator
  • Consistent handling of kwargs in teardown functions
  • Improve treepath detection in cf_inspect; optimize UniteColumns ROOT compression
  • Improve tmp file check and flag file handling in venv setup
  • Use return code in cf_remove_tmp; add local directory check
  • [cms] Add note on TEC-to-MET propagation
  • Store jec level as attribute on jec correctors
  • Bump version in __version__ file
  • Improve readability of make_jme_keys

Logging & Verbosity

  • More verbose errors for parquet metadata failures, norm weight missing cross sections, memory issues during filling, and broken histogram files
  • Warn about flow content for FlowStrategy.move
  • [cms] Update log in CheckCATUpdates task; add URL to log

Documentation

  • docs: add LennertGriesing, Lara813, aalvesan, Bogdan-Wiederspan, LuSchaller as contributors

Tests

  • Adjust unit tests for new behaviour and deterministic config object ordering

This PR was auto-generated to track upstream changes. It is not intended to be merged directly.

riga and others added 30 commits July 30, 2025 13:58
* update met_phi Calibrator to new format

* use npvsGood

* add npvsGood to uses as well...

* Minor adjustments, apply mask to all inputs.

---------

Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de>
Co-authored-by: Marcel R. <github.riga@icloud.com>
* Generalize normalization weight producer.

* Add pull warning.

* Add per-dataset weight norm.

* Update.

* Optionally log brs.

* Improve combinatoric treatment, fix single br calculation.

* Helper to fill weight table.

* Minor adjustments before review.
Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de>
Co-authored-by: Marcel Rieger <riga@users.noreply.github.com>
riga and others added 28 commits April 16, 2026 15:11
* docs: update README.md [skip ci]

* docs: update .all-contributorsrc [skip ci]

---------

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
@JulesVandenbroeck JulesVandenbroeck marked this pull request as ready for review June 24, 2026 14:39
Resolve unresolved merge conflicts in mixins.py (taking upstream version),
restore missing GhentAnalysis classes (DatasetsMixin, SelectorStepsMixin,
MergeCutflowHistograms, MergeHistogramMixin), fix duplicate law.cfg entry,
and clean up F401/F811/E303/F821 flake8 errors across multiple files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants