Skip to content

Fix script bugs from EventDisplay_v4 issue #352#145

Merged
GernotMaier merged 14 commits into
mainfrom
v492-dev18
May 31, 2026
Merged

Fix script bugs from EventDisplay_v4 issue #352#145
GernotMaier merged 14 commits into
mainfrom
v492-dev18

Conversation

@GernotMaier
Copy link
Copy Markdown
Member

@GernotMaier GernotMaier commented May 30, 2026

Bug-Fix Summary — EventDisplay Issue #352

Fixes applied to Eventdisplay_AnalysisScripts_VTS for bugs reported in
VERITAS-Observatory/EventDisplay_v4#352.


Analysis scripts (Section C)

C1 — ANALYSIS.anasum.sh: job submission never executed

File: scripts/ANALYSIS.anasum.sh

SUBC was assigned the path to the helper script as a string instead of
executing it. The submission command was therefore never built, so no jobs
were ever submitted.

-SUBC="$EVNDISPSCRIPTS/UTILITY.readSubmissionCommand.sh"
+SUBC=$("$EVNDISPSCRIPTS/UTILITY.readSubmissionCommand.sh")
 SUBC=$(eval "echo \"$SUBC\"")

C2 — ANALYSIS.mscw_energy_sub.sh: wrong condition for RedHV DispBDT fallback

File: scripts/helper_scripts/ANALYSIS.mscw_energy_sub.sh

The get_disp_dir() function selected a 55° zenith-angle fallback training
directory for RedHV data only when ${SIMTYPE_RUN} == "CARE_RedHV" (exact
match). The function already uses ${HVSETTINGS} for the base directory,
so the condition was changed to match that variable instead:

-elif [[ ${SIMTYPE_RUN} == "CARE_RedHV" ]]; then
+elif [[ ${HVSETTINGS} == "obsLowHV" ]]; then

C5 — ANALYSIS.mscw_energy_sub.sh: missing $ before DISPBDT

File: scripts/helper_scripts/ANALYSIS.mscw_energy_sub.sh

Log message printed the literal string DISPBDT instead of the variable
value:

-if [[ DISPBDT == "1" ]]; then
+if [[ $DISPBDT == "1" ]]; then

C10 — IRF.optimizeTMVAforGammaHadronSeparation.sh: unsafe regex for * ENERGYBINS

File: scripts/IRF.optimizeTMVAforGammaHadronSeparation.sh

Runparameter files use * ENERGYBINS (literal asterisk). The * without
escaping is undefined in POSIX BRE right after ^:

-ENERGYBINS=$(grep "^* ENERGYBINS" …)
+ENERGYBINS=$(grep "^\* ENERGYBINS" …)-| sed 's/* ENERGYBINS 1//'
+| sed 's/\* ENERGYBINS//'

C11 — IRF.optimizeTMVAforGammaHadronSeparation.sh: no exit 1 on missing effective-area file

File: scripts/IRF.optimizeTMVAforGammaHadronSeparation.sh

The script printed an error and continued instead of stopping when the
effective-area file was missing:

 echo "ERROR: effective area file not found: ${EFFAREA}"
+exit 1

C20 — ANALYSIS.v2dl3.sh: LOGIDR typo and undefined $AFILE

File: scripts/ANALYSIS.v2dl3.sh

Two distinct bugs:

  1. ${LOGIDR} (typo) → ${LOGDIR} — the cleanup rm -f targeted the
    wrong (unexpanded/empty) directory.
  2. $AFILE (undefined variable) → $J (the loop variable) — run IDs in
    log output were blank.
-rm -f ${LOGIDR}/x* 2>/dev/null
+rm -f ${LOGDIR}/x* 2>/dev/null-echo "RUN $AFILE JOBID $JOBID"
+echo "RUN $J JOBID $JOBID"

C21 — ANALYSIS.anasum_allcuts.sh: fragile relative ./ invocations

File: scripts/ANALYSIS.anasum_allcuts.sh

The script invoked sibling scripts with ./ANALYSIS.…, which fails unless
the user's working directory is the scripts/ folder:

-./ANALYSIS.anasum_parallel_from_runlist.sh …
+$(dirname "$0")/ANALYSIS.anasum_parallel_from_runlist.sh …-./ANALYSIS.v2dl3.sh …
+$(dirname "$0")/ANALYSIS.v2dl3.sh …

IRF production chain (Section D)

D1+D2 — IRF.production.sh: redundant inner loops and wrong loop variable

File: scripts/IRF.production.sh

The TRAINTMVA/OPTIMIZETMVA block contained extra for VX in $EPOCH and
for ATM in $ATMOS loops nested inside the outer epoch/atmosphere loops,
causing N_epoch × N_atmo times more job submissions than intended. After
removing the redundant inner loops, ${EPOCH:0:2} (the full epoch list)
was replaced with ${VX:0:2} (the correct loop variable):

-for VX in $EPOCH; do
-    for ATM in $ATMOS; do
-        for C in ${CUTTYPES[@]}; do
+for C in ${CUTTYPES[@]}; do-            grep "* sizesecondmax" … | grep ${EPOCH:0:2} …
+            grep "* sizesecondmax" … | grep ${VX:0:2} …-            if [[ ${EPOCH:0:2} == "V4" ]]; then
+            if [[ ${VX:0:2} == "V4" ]]; then-        done
-    done
-done
+done

D3 — IRF.mscw_energy_MC_sub.sh: unquoted glob, can fail on spaces / special chars

File: scripts/helper_scripts/IRF.mscw_energy_MC_sub.sh

-MSCFILES=$(ls ${INDIR}/*[0-9].root 2>/dev/null)
+MSCFILES=$(ls "${INDIR}"/*[0-9].root 2>/dev/null)

D4 — IRF.generate_lookup_table_parts.sh / IRF.mscw_energy_MC.sh: double .sh extension

Files: scripts/IRF.generate_lookup_table_parts.sh, scripts/IRF.mscw_energy_MC.sh

FSCRIPT is defined with the .sh suffix, but submission calls used
$FSCRIPT.sh, producing a nonexistent ….sh.sh path:

 FSCRIPT="$LOGDIR/…name….sh"
-JOBID=`$SUBC $FSCRIPT.sh`
+JOBID=`$SUBC $FSCRIPT`

(Same change applied to all submission/execution branches: SGE, Condor,
GNU parallel, and simple.)


D6 — IRF.combine_lookup_table_parts.sh: existence check on filename alone

File: scripts/IRF.combine_lookup_table_parts.sh

$OFILE is just a filename, not a path — the check always failed when the
current directory differed from $ODIR:

-if [[ -f $OFILE ]]; then
+if [[ -f "$ODIR/$OFILE" ]]; then

D7 — zstd availability check used non-portable which

Files: scripts/helper_scripts/IRF.mscw_energy_MC_sub.sh,
IRF.compress_evndisp_MC_sub.sh, IRF.evndisp_MC_sub.sh,
IRF.lookup_table_parallel_sub.sh

which zstd returns exit 0 even when the command is absent on some
systems. Replaced with POSIX-portable command -v:

-if which zstd &>/dev/null; then
+if command -v zstd &>/dev/null; then

D9 — IRF.optimizeTMVAforGammaHadronSeparation_sub.sh: rm uses MVADIR before it is defined

File: scripts/helper_scripts/IRF.optimizeTMVAforGammaHadronSeparation_sub.sh

rm -f ${MVADIR}/rates.log appeared 11 lines before MVADIR=… was
assigned, so it silently operated on an empty path. Moved to immediately
after the assignment:

-    rm -f ${MVADIR}/rates.log       # ← before MVADIR is set
 …
 MVADIR="$VERITAS_EVNDISP_AUX_DIR/…"
+rm -f "${MVADIR}/rates.log"         # ← now in the correct place

D10 — IRF.production.sh: fragile ./IRF. invocations

File: scripts/IRF.production.sh

Three calls used ./IRF.*.sh, which fails when the script is launched from
outside the scripts/ directory:

-./IRF.dispXGB.sh "stereo_analysis" …
+$(dirname "$0")/IRF.dispXGB.sh "stereo_analysis" …

-./IRF.dispXGB.sh "classification" …
+$(dirname "$0")/IRF.dispXGB.sh "classification" …

-./IRF.trainXGBforGammaHadronSeparationTraining.sh …
+$(dirname "$0")/IRF.trainXGBforGammaHadronSeparationTraining.sh …

Bonus — IRF.mscw_energy_MC_sub.sh: RedHV 55° path was appended, not substituted

File: scripts/helper_scripts/IRF.mscw_energy_MC_sub.sh

Analogous to C2, for MC production: when zenith angle ≥ 58° and the
simulation type is CARE_RedHV*, the script should replace the 60deg
or 65deg training path with 55deg, but instead it appended /55deg/
to the already-constructed path. Fixed by restructuring the if/else to set
the path once for each case.


XGB angular reconstruction (Section E)

E3 — IRF.trainXGBforAngularReconstructionBinned.sh: hard-coded RECID0

File: scripts/IRF.trainXGBforAngularReconstructionBinned.sh

The input directory always pointed to MSCW_RECID0_DISP regardless of the
RECID argument ($6):

-INDIR="…/MSCW_RECID0_DISP"
+INDIR="…/MSCW_RECID${RECID}_DISP"

E7 — IRF.trainXGBforAngularReconstruction_sub.sh: temp dir used undefined $MSCW_FILE

File: scripts/helper_scripts/IRF.trainXGBforAngularReconstruction_sub.sh

MSCW_FILE is never set in this script; LLIST (the input .list file)
is the actual variable. The temp directory name was also not unique enough
for parallel runs:

-TEMPDIR=$TMPDIR/$(basename $MSCW_FILE .root)
+TEMPDIR=$TMPDIR/XGB-$(basename $LLIST .list)-$(uuidgen)

Python / DB scripts (Section C continued)

C12 — ANALYSIS.v2dl3_sub.sh: pip install -e ran on every batch job

File: scripts/helper_scripts/ANALYSIS.v2dl3_sub.sh

Installing the package unconditionally on every worker job caused unnecessary
network traffic and race conditions. Now guarded with pip show:

-pip install -e "${V2DL3SYS%/}-v${V2DL3VERSION}"
+pip show v2dl3-eventdisplay &>/dev/null 2>&1 || pip install -e "${V2DL3SYS%/}-v${V2DL3VERSION}"

C13 — ANALYSIS.anasum_combine.sh: no completeness check before combining

File: scripts/ANALYSIS.anasum_combine.sh

The combine step could silently produce an empty/invalid result when the
parallel anasum jobs had not yet finished. A guard now aborts early:

+NANASUMFILES=$(find "$DDIR" -maxdepth 1 -name "*.anasum.root" | wc -l)
+if [[ $NANASUMFILES -eq 0 ]]; then
+    echo "Error: no .anasum.root files found in $DDIR"
+    echo "Run ANALYSIS.anasum_parallel_from_runlist.sh and wait for completion before combining."
+    exit 1
+fi

C14 — ANALYSIS.mscw_energy_sub.sh: DISPDIR used without existence check

File: scripts/helper_scripts/ANALYSIS.mscw_energy_sub.sh

get_disp_dir() can return a path that does not exist (e.g., wrong ZA bin).
Added a guard matching the existing TABFILE check pattern:

+if [[ ! -d "$DISPDIR" ]]; then
+    echo "Error: DispBDT directory $DISPDIR not found, exiting..."
+    exit 1
+fi

C15 — db_run.sh: index variables uninitialized before parser loops

File: scripts/db_scripts/db_run.sh

start_time_index, end_time_index, and source_index were used in
${a[$index]} subscripts on the first loop iteration before the header
line set them. This caused reading from index 0 (silently wrong) rather
than the correct column:

+start_time_index=0
 while IFS="|" read -ra a; do …

Same fix applied to end_time_index and source_index.


C16 — db_run.sh: tarball presence suppressed laser re-read even when sidecar missing

File: scripts/db_scripts/db_run.sh

The original comment admitted "implementation of testing missing". The
existing get_file_status() function is now used to also verify the
.laserrun sidecar exists inside the tarball:

-# don't test and read if tar file exists
-# (implementation of testing missing)
-if [[ ! -e $(getDBTextFileDirectory ${RUN}).tar.gz ]] || [[ ${OVERWRITE} == "1" ]]; then
+RUND="$(getDBTextFileDirectory ${RUN})"
+if [[ ! -e "${RUND}.tar.gz" ]] || \
+   [[ $(get_file_status ${RUN} "${RUND}/${RUN}.laserrun") == "0" ]] || \
+   [[ ${OVERWRITE} == "1" ]]; then

C17 — db_write_fits.py: convert_table_comment_to_ascii called before None check

File: scripts/db_scripts/db_write_fits.py

When read_file() returns None (unreadable file), the DQM converter was
called on None, raising an AttributeError. Moved inside the None
guard:

-if "dqm" in file_name.lower():
-    convert_table_comment_to_ascii(table)
 if table is not None:
+    if "dqm" in file_name.lower():
+        convert_table_comment_to_ascii(table)
     hdu.append(fits.BinTableHDU(table))

C18 — db_combine_dqm_fits.py: join_type="exact" crashes on schema drift

File: scripts/db_scripts/db_combine_dqm_fits.py

astropy.table.vstack with join_type="exact" raises an exception when
column sets differ across runs (e.g., a new DQM field added mid-season).
Changed to "outer" so missing columns are filled with masked values:

-final_combined_table = vstack(combined_tables, join_type="exact")
+final_combined_table = vstack(combined_tables, join_type="outer")

C19 — db_write_fits.py: unguarded division by zero for L3 rate and dead time

File: scripts/db_scripts/db_write_fits.py

l3_values / time_diffs and busy / ten_mhz could divide by zero when
consecutive timestamps are equal or TenMHzScaler wraps. Both divisions
are now guarded with np.where:

-l3_values /= time_diffs
+with np.errstate(divide="ignore", invalid="ignore"):
+    l3_values = np.where(time_diffs > 0, l3_values / time_diffs, np.nan)
-dead_time = np.mean(busy / ten_mhz)
-dead_time_std = np.std(busy / ten_mhz)
+with np.errstate(divide="ignore", invalid="ignore"):
+    ratio = np.where(ten_mhz > 0, busy / ten_mhz, np.nan)
+dead_time = np.mean(ratio)
+dead_time_std = np.std(ratio)

Issues intentionally skipped

Issue Reason
C3 — FORCEDATMO injection Requires understanding caller contract; no clear minimal fix
C4 — V4/V5 atmo normalisation Design decision, not a clear bug
C6 — Positional parameter parsing Refactor scope, not a minimal fix
C8, C9 — DBTEXT / DB query logic Complex; risk of regressions
D5 — Wrong positional arg Appears already fixed ($8 present)
E1, E2, E5, E6, E8–E11 Design-level or require C++ changes

GernotMaier and others added 5 commits May 30, 2026 18:29
C1  ANALYSIS.anasum.sh: execute UTILITY.readSubmissionCommand.sh instead
    of storing its path as a string (no jobs were ever submitted)
C2  ANALYSIS.mscw_energy_sub.sh: fix RedHV DispBDT fallback condition
    (use HVSETTINGS=obsLowHV instead of exact SIMTYPE_RUN match)
C5  ANALYSIS.mscw_energy_sub.sh: add missing $ before DISPBDT in log guard
C10 IRF.optimizeTMVAforGammaHadronSeparation.sh: escape * in grep/sed
    ENERGYBINS regex (was undefined BRE behaviour)
C11 IRF.optimizeTMVAforGammaHadronSeparation.sh: add exit 1 on missing
    effective-area file
C20 ANALYSIS.v2dl3.sh: fix LOGIDR→LOGDIR typo; fix AFILE→J (undefined var)
C21 ANALYSIS.anasum_allcuts.sh: use $(dirname "$0") for sibling scripts
D1  IRF.production.sh: remove redundant inner epoch/atmo loops inside the
    TRAINTMVA/OPTIMIZETMVA block (caused N²-fold over-submission)
D2  IRF.production.sh: replace ${EPOCH:0:2} with ${VX:0:2} inside the
    per-epoch loop (EPOCH is the full list, VX is the loop variable)
D3  IRF.mscw_energy_MC_sub.sh: quote ${INDIR} in glob expansion
D4  IRF.generate_lookup_table_parts.sh, IRF.mscw_energy_MC.sh: FSCRIPT
    already ends in .sh; remove spurious .sh suffix in submission calls
D6  IRF.combine_lookup_table_parts.sh: check $ODIR/$OFILE, not $OFILE
D7  Four helper scripts: replace 'which zstd' with 'command -v zstd'
    (POSIX-portable; 'which' may return 0 even when absent)
D9  IRF.optimizeTMVAforGammaHadronSeparation_sub.sh: move
    'rm -f ${MVADIR}/rates.log' to after MVADIR is assigned
D10 IRF.production.sh: replace ./IRF.*.sh with $(dirname "$0")/IRF.*.sh
    for dispXGB and trainXGB calls
E3  IRF.trainXGBforAngularReconstructionBinned.sh: use RECID${RECID}_DISP
    instead of hard-coded RECID0_DISP
E7  IRF.trainXGBforAngularReconstruction_sub.sh: replace undefined
    $MSCW_FILE with $LLIST in TEMPDIR; add uuidgen for uniqueness
Bonus IRF.mscw_energy_MC_sub.sh: fix RedHV 55deg path being appended
    instead of replacing the 60/65deg path

Ref: VERITAS-Observatory/EventDisplay_v4#352

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GernotMaier and others added 6 commits May 31, 2026 10:23
C12: guard pip install in v2dl3_sub.sh to avoid per-job reinstall
C13: check anasum.root files exist before combining in anasum_combine.sh
C14: add DISPDIR existence check in mscw_energy_sub.sh
C15: initialize index variables before while loops in db_run.sh
C16: use get_file_status to check laserrun sidecar before skipping re-read
C17: move convert_table_comment_to_ascii inside None guard in db_write_fits.py
C18: change join_type from exact to outer in db_combine_dqm_fits.py
C19: guard l3_rate and dead-time divisions against zero denominators

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR applies a set of targeted bug fixes across the EventDisplay VTS analysis/IRF production shell scripts and DB/FITS utilities, addressing failures and unsafe behaviors reported in EventDisplay_v4 issue #352.

Changes:

  • Fixes job submission/execution logic, path handling, and portability issues in IRF/analysis shell scripts (submission command parsing, relative ./ invocations, missing guards, command -v checks, .sh.sh paths, etc.).
  • Improves robustness of DB-related scripts (FITS writing/combining, tarball/sidecar checks, division-by-zero guards).
  • Corrects configuration and training script issues (RECID handling for XGB angular reconstruction, schema drift handling in DQM combines).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/set_environment.sh Updates the apptainer image reference (rc3 → rc4).
scripts/IRF.trainXGBforAngularReconstructionBinned.sh Fixes hard-coded RECID0 input directory to respect RECID argument.
scripts/IRF.selectRunsForGammaHadronSeparationTraining.sh Adjusts zenith-bin parsing path; currently adds extra stdout output (see comment).
scripts/IRF.production.sh Fixes redundant TMVA training loops and makes IRF script invocations robust to cwd.
scripts/IRF.optimizeTMVAforGammaHadronSeparation.sh Fixes unsafe grep regex for * ENERGYBINS and exits on missing effective-area file.
scripts/IRF.mscw_energy_MC.sh Fixes job submission using $FSCRIPT (avoids .sh.sh).
scripts/IRF.generate_lookup_table_parts.sh Fixes job submission using $FSCRIPT (avoids .sh.sh).
scripts/IRF.combine_lookup_table_parts.sh Fixes output-file existence check to use $ODIR/$OFILE.
scripts/helper_scripts/IRF.trainXGBforAngularReconstruction_sub.sh Improves temp dir uniqueness; needs quoting fixes for robustness (see comment).
scripts/helper_scripts/IRF.optimizeTMVAforGammaHadronSeparation_sub.sh Removes rm that referenced MVADIR before it is set.
scripts/helper_scripts/IRF.mscw_energy_MC_sub.sh Fixes RedHV 55° directory selection, zstd detection, and copying behavior.
scripts/helper_scripts/IRF.lookup_table_parallel_sub.sh Uses command -v instead of non-portable which-style checks.
scripts/helper_scripts/IRF.evndisp_MC_sub.sh Adds nullglob protection and uses command -v for zstd checks.
scripts/helper_scripts/IRF.compress_evndisp_MC_sub.sh Uses command -v for zstd checks.
scripts/helper_scripts/ANALYSIS.v2dl3_sub.sh Avoids per-job pip install -e by checking package presence first.
scripts/helper_scripts/ANALYSIS.mscw_energy_sub.sh Fixes RedHV fallback condition, adds missing $ in DISPBDT check, and validates DispBDT directory exists.
scripts/helper_scripts/ANALYSIS.evndisp_sub.sh Adds error handling and readability checks for unpacked DBTEXT metadata.
scripts/helper_scripts/ANALYSIS.anasum_sub.sh Removes atmosphere “normalization” helper and uses runinfo-derived ATMO directly.
scripts/db_scripts/db_write_fits.py Fixes None-guard ordering and guards divide-by-zero in rate/deadtime computations.
scripts/db_scripts/db_run.sh Initializes header indices and improves tarball/sidecar presence logic.
scripts/db_scripts/db_combine_dqm_fits.py Switches vstack join type to outer to tolerate schema drift.
scripts/ANALYSIS.v2dl3.sh Fixes ${LOGIDR} typo and undefined $AFILE in status logging.
scripts/ANALYSIS.mscw_energy.sh Updates usage/help text and removes unused forced-atmosphere argument parsing.
scripts/ANALYSIS.anasum.sh Fixes submission command acquisition; current syntax needs correction (see comment).
scripts/ANALYSIS.anasum_combine.sh Adds guard to ensure there are .anasum.root files before combining.
scripts/ANALYSIS.anasum_allcuts.sh Makes script invocations robust to being run outside scripts/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/ANALYSIS.anasum.sh
Comment thread scripts/helper_scripts/IRF.trainXGBforAngularReconstruction_sub.sh Outdated
Comment thread scripts/helper_scripts/IRF.mscw_energy_MC_sub.sh Outdated
Comment thread scripts/IRF.selectRunsForGammaHadronSeparationTraining.sh
@GernotMaier GernotMaier marked this pull request as ready for review May 31, 2026 11:15
@GernotMaier GernotMaier merged commit acb9685 into main May 31, 2026
1 check passed
@GernotMaier GernotMaier deleted the v492-dev18 branch May 31, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants