Modernize packaging, pipeline, and report generation workflows#341
Open
alsmith151 wants to merge 160 commits into
Open
Modernize packaging, pipeline, and report generation workflows#341alsmith151 wants to merge 160 commits into
alsmith151 wants to merge 160 commits into
Conversation
- Updated report_text.yml to change comments to proper headings and improved descriptions. - Removed the copy_report_template rule from statistics.smk and replaced it with a script call to make_report.py. - Enhanced the capcruncher_subprocess_env fixture in conftest.py to include the repository root in PYTHONPATH. - Updated docker.md to reflect the removal of Quarto from the Docker image. - Added plotly and pyyaml to environment.yml and requirements files. - Updated pyproject.toml to include .py files for report generation and removed unnecessary ignores. - Added tests for new report generation functionality and improved existing tests to ensure proper handling of viewpoint categories. - Implemented functionality to prune unused viewpoint categories in interactions_deduplicate and slice filtering.
- bed.py: recover from SchemaError by dropping rows where end <= start instead of returning empty DataFrame; use pd.Series wrappers to avoid bool.__invert__ deprecation (Python 3.16) - genome.py: filter zero-length fragments before sorting digest output - conftest.py: strip --compress-prog-args from flash2 invocation; forward exit code from subprocess
- Run pre-commit install + pre-push hooks - Fix broken symlink: viewpoints.bed -> mm9_capture_viewpoints_Slc25A37.bed - Exclude cookiecutter config dir from check-yaml (% chars in template) - Add strict=False to zip() calls in get_test_data.ipynb (B905) - Auto-fix trailing whitespace, end-of-file, ruff, ruff-format, snakefmt
- conftest.py: remove flash/gzcat/gsplit/multiqc shims; keep only the capcruncher shim that points at the local checkout. Real tools are now taken directly from the pixi environment. - fastq.smk: invoke flash2 directly; drop --compress-prog-args pigz which flash2 does not support. - fastq.py / common.smk: on macOS accept the pixi-provided GNU split (unprefixed) in addition to gsplit; use gzip -dc instead of platform-specific zcat/gzcat for .gz decompression. - plot.py: pass theme=None to GenomicFigure to work around a plotnado 0.3.1 bug where Theme.apply raises AttributeError on Spacer tracks that have no aesthetics field. - test_workflow_scripts.py: update digest golden row count 303591→303397 to reflect zero-length fragment filtering added in 1949bfc. - docs/plotting.ipynb: add missing import pyranges1 as pr.
- cookiecutter.json / capcruncher_config.yml: add plotting_genes variable so the genes path is configurable at project creation rather than hardcoded to a placeholder that causes a plotnado error at plot time. - plot.py: guard genes track with pathlib.Path.is_file() so a missing or placeholder path is silently skipped. - test_pipeline.py: pass mm9_chr14_genes.bed as plotting_genes in both config fixtures; add data_path fixture parameter. - pixi.toml: add --dist loadscope to pipeline/all test tasks so that pytest-xdist keeps all tests in the same module on the same worker — fixes test_stats_exist / test_bigwigs_exist / test_hub_exists failures caused by module-scoped fixtures being recreated on a different worker. - test_workflow_scripts.py: update viewpoint_bins golden value 169744→169634 to reflect the shift in fragment IDs after zero-length fragment removal (1949bfc).
…n pileup function fix: add check for empty sorted bedgraph file in bedgraph_to_bigwig rule test: update expected output values in capture_pipeline_golden_outputs test
Add MANIFEST.in to correctly prune non-runtime trees (docs, tests, .github, lock files, Dockerfile) from the source distribution. Tighten pyproject.toml: add inline comment explaining why the ray extra is excluded from all (heavy footprint, not needed for standard pipeline).
Pin matplotlib >=3.10.9 in environment.yml to match pixi.toml. Update pixi.toml and regenerate pixi.lock with current dependency resolutions.
Multi-stage build strips compilers and Rust from the runtime image. Update .dockerignore to exclude tests, docs, lock files, and CI config from the build context so only runtime-necessary files are copied.
install-methods.yml: new workflow testing Python wheel, conda fallback, Docker, and Apptainer installs on every PR. Uses uv throughout; all smoke tests check --version before --help. repo-health.yml: new weekly schedule testing published packages on PyPI, Bioconda (gracefully warns if lagging), and Docker registry. CD.yml: add verify-pypi job that installs the just-published wheel by exact version from PyPI and runs smoke tests. container-build.yml: minor smoke test update.
Verify MANIFEST.in pruning, modern license metadata, extras aggregate, pyranges1-only environment, critical dependency bounds across all four manifests, documentation priority ordering, Docker/Apptainer CI coverage, and install-methods CI contract. All tests are static (no network/build).
Swap Apptainer above Docker throughout — most users are on HPC where Apptainer is the native container runtime. Add a decision table at the top of installation.md so non-technical users can identify their route at a glance. Expand Apptainer section with offline .sif fallback workflow. Replace pip with uv in the conda fallback install command.
…iner does not exist
…he mount rm -rf /home/mambauser/.cache fails with 'Device or resource busy' because the pip cache is a live BuildKit mount at that path. The home directory is not copied to the runtime stage anyway, so the cleanup was a no-op.
…n docs
Part 1 — fix comparison separator:
- Use `_vs_` instead of `-` to join condition names in comparison filenames,
resolving MissingInputException when conditions themselves contain hyphens
- Update COMPARISON_TRACK_PATTERN regex in make_ucsc_hub.py to match new separator
- Fix bigwig_summarised wildcard constraint (was `comparison=`, should be `group=`)
- Broaden visualise.smk wildcard constraints to allow hyphens and underscores
Part 2 — design matrix:
- Add Pandera DesignSchema (unique sample, no-dot condition check)
- Replace get_design_matrix() with infer_design_from_fastqs() using correct
rsplit logic: condition=everything before last `_`, replicate=last token
- Fix FastqSamples.from_files() to delegate to infer_design_from_fastqs()
- Validate design on Snakefile startup; force COMPARE_SAMPLES=False when all
conditions are UNKNOWN (no design provided and inference failed)
- Add `capcruncher pipeline design` subcommand to preview/save inferred design
Part 3 — genome profiles:
- Add `capcruncher genome add/list/show/remove` for per-genome YAML profiles
stored in ~/.capcruncher/genomes/ (XDG_CONFIG_HOME aware)
- Resolve `genome: {profile: <name>}` in format_config_dict() before validation
- Add `genome_profile` field to cookiecutter config template
- Add `capcruncher pipeline config --list-profiles` shorthand
Docs — update all references from deprecated `capcruncher pipeline` /
`pipeline-init` / `pipeline-config` to `capcruncher pipeline run` /
`capcruncher pipeline init` / `capcruncher pipeline config`; document
design matrix convention, validation rules, and genome profiles
…dd snakemake plugins
…lumns in input data
…ovements - Add run summary scorecard (alignment %, capture efficiency %, cis %, viewpoints detected) - Add capture efficiency, cis/trans ratio, viewpoint detection summary, and reads-per-viewpoint uniformity sections - Add alignment filtering dropout (% retained) chart tab - Add count_religation.py script and Snakemake rule to measure re-ligation artefacts and cis interaction distance distributions per viewpoint - Fix slider label overlapping plots (hide currentvalue overlay, increase pad) - Reduce left margin (150→80px), cap slice-length histogram at 99th percentile - Simplify cis/trans chart (remove pattern_shape, use facet_col instead of facet_row) - Add all-samples box-plot summary tab to pipeline run statistics - Scale chart heights with sample/viewpoint count, capped at 800px - Add loguru logging throughout report generation - Update report_text.yml with descriptions for all new sections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant modernization and infrastructure improvements to the CapCruncher project. The main changes include adding a new Docker build system, updating CI workflows for Python 3.12 and newer GitHub Actions versions, improving documentation for both users and developers, and cleaning up legacy or unused code and dependencies. These changes aim to make the project easier to maintain, more robust across environments, and ready for current best practices in Python and workflow management.
Containerization and Deployment:
Dockerfilebased onmicromamba, supporting bothlinux/amd64andlinux/arm64, and including key dependencies such as Apptainer and Quarto. This enables robust container-based workflows for both local and HPC environments..dockerignorefile to optimize Docker builds by excluding unnecessary files and directories.container-build.yml) for automated container builds, smoke testing, and publishing images to GitHub Container Registry.Continuous Integration and Testing:
Documentation and Developer Guidance:
README.mdfor clarity, modern usage, and quick-start instructions, including new sections on installation, CLI, and development.AGENTS.mdfile with detailed modernization, development, and workflow guidelines for contributors, including conventions, environment notes, and known caveats.Codebase and Packaging Cleanup:
capcruncher/api/__init__.py, reflecting a move away from exposing a monolithic API surface.MANIFEST.in.These changes collectively modernize CapCruncher’s infrastructure, improve developer and user experience, and set up the project for reliable containerized and CI-driven workflows.