Skip to content

feat: pigz#254

Open
rrsettgast wants to merge 3 commits into
mainfrom
feature/pigz
Open

feat: pigz#254
rrsettgast wants to merge 3 commits into
mainfrom
feature/pigz

Conversation

@rrsettgast
Copy link
Copy Markdown
Contributor

uses parallel gzip (pigz) for creating archives.

wrtobin and others added 3 commits September 29, 2025 12:38
Prefer a tar-to-pigz pipeline when packing integrated test baselines so
gzip compression can use the available CPU cores. Keep the existing
Python gztar path as a fallback when tar or pigz is unavailable, and
preserve the existing .tar.gz archive format.
Copilot AI review requested due to automatic review settings May 27, 2026 01:27
@rrsettgast rrsettgast requested a review from castelletto1 May 27, 2026 01:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional fast-path for creating baseline tarballs using external tar + parallel gzip (pigz), while keeping the existing Python shutil.make_archive(..., format="gztar") behavior as a fallback. It also expands the default restart-check exclusion patterns to ignore additional HDF5 paths that are likely unstable across runs.

Changes:

  • Add a tar | pigz pipeline for baseline archive creation, falling back to Python gztar if tools are unavailable or the pipeline fails.
  • Add a helper to determine an appropriate CPU thread count for pigz.
  • Update restart_check.py default exclusion regex list to include dNdX and detJ.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
geos-ats/src/geos/ats/helpers/restart_check.py Extends default HDF5 exclusion patterns used during restart comparisons.
geos-ats/src/geos/ats/baseline_io.py Adds an external tar + pigz archiving path to speed up baseline archive creation with a fallback to Python’s gztar.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +181 to +205
try:
with open( archive_path, 'wb' ) as output:
tar_process = subprocess.Popen( [ tar_bin, '-C', baseline_path, '-cf', '-', '.' ],
stdout=subprocess.PIPE )
if tar_process.stdout is None:
raise RuntimeError( 'failed to capture tar output' )
pigz_process = subprocess.Popen( [ pigz_bin, '-9', '-p', threads ],
stdin=tar_process.stdout,
stdout=output )
tar_process.stdout.close()

pigz_status = pigz_process.wait()
tar_status = tar_process.wait()

if tar_status != 0 or pigz_status != 0:
try:
os.remove( archive_path )
except FileNotFoundError:
pass
raise RuntimeError( f'tar exited with {tar_status}; pigz exited with {pigz_status}' )

except Exception as e:
logger.warning( 'Parallel baseline archive creation failed; using Python gztar archiver' )
logger.warning( repr( e ) )
return False
@jafranc jafranc changed the title Feature/pigz feat: pigz May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants