Skip to content

kircherlab/CADD_threshold_app

Repository files navigation

CADD Threshold APP

DOI GitHub License GitHub Release PyPI version Bioconda Version Tests GitHub Issues GitHub Pull Requests

A Shiny-for-Python web application to explore and compare distributions of ClinVar variants across different CADD PHRED-score thresholds, filter by gene lists or panels, and export per-gene/per-panel or filtered annotation summaries. The app is primarily intended for investigating the score distribution of known pathogenic and benign variants for different CADD PHRED-score thresholds.

This README explains the repository layout, how to run the app locally (pip/conda).

Highlights

  • Interactive visualizations of CADD PHRED-score distributions
  • Compare distributions across CADD/ClinVar versions and genome builds
  • Per-gene filtering (paste a list or upload a file) and exportable summaries
  • Per-panel filtering using panels from PanelApp and exportable summaries

Requirements

  • Python 3.10+ (3.12 recommended)
  • See requirements.txt or environment.yml for full dependencies
  • Docker (optional) — a Dockerfile is included for containerized runs

Installation

Data preperation

The underlying data for the CADD-ThresholdApp needs to be downloaded, if the source code is downloaded as a package from bioconda or pip. The data can be downloaded here: DOI. The data is also versionized seperately from the packages. You can also preprocess your own data for the website using this Snakemake workflow: CADD_threshold_analysis.

Data overview

  • data/ - contains preprocessed tables, panel summaries and metrics used by the app.
    • paneldata/ - CSVs summarizing panels and versions used by the UI
    • panel_metrics/ - generated metrics stored by date/version

Notes:

  • Large raw annotation files are typically not tracked in the repository. The app expects prepared/normalized CSV inputs - use https://github.com/kircherlab/CADD_threshold_analysis to regenerate CSV inputs or use the modules/panelapp/ utilities if you need to regenerate panel CSVs from PanelApp.
  • if you choose to use your own data you need to make sure that the beginning of the file contains an identifier (e.g. GRCh37-v1.7 for our use case)
  • you also need to edit the VERSION_GR_CHOICES in the ui_components.py file with you identifiers (and in calculate_panel_metrics_and_save.py)
  • additionally you need to change the file names in the data_loader.py and change column names that are being called (ClinicalSignificance, GeneName, PHRED, Genes etc.)

How to update Panel Data

  • if you want to update the Panel Data, you need to run modules\panelapp\main_panelapp.py
  • If you run this, the Panel Overview will be updated and the old one will be saved as a backup, then the new metrics for all versions and genome releases will be calculated (Note: this takes several hours)

Pre-compiled packages

Using conda

conda create -n cadd_threshold_app -c bioconda -c conda-forge cadd-threshold-app
conda activate cadd_threshold_app
cadd-threshold-app --data </path/to/data>

Using pip

pip install cadd-threshold-app
cadd-threshold-app --data </path/to/data>

From source

git clone https://github.com/kircherlab/CADD_threshold_app.git
cd CADD_threshold_app
pip install .
cadd-threshold-app --data data

Install as package (editable, recommended for development)

pip install -e .

Run the app

Option A: run via the package entry point

This requires installing the project as a package (e.g. pip install -e .).

cadd-threshold-app --data </path/to/data>

Alternatively to the cli option --data, you can set the CADD_THRESHOLD_APP_DATA_DIR environment variable.

export CADD_THRESHOLD_APP_DATA_DIR=data
cadd-threshold-app

Further CLI options are available to configute host and port - run cadd-threshold-app --help for details.

Option B: run from the repository root. Please set the CADD_THRESHOLD_APP_DATA_DIR environment variable to point to your data directory (e.g. data/ in the repository) before running.

export CADD_THRESHOLD_APP_DATA_DIR=data
python -m shiny run cadd_threshold_app.app:app

Then open http://localhost:8080 in your browser.

Key files and modules

  • app.py - Shiny app entrypoint and UI wiring
  • server_logic.py - main server-side reactive logic and handlers
  • data_loader.py - helpers to load and preprocess annotation tables
  • ui_components.py - UI
  • modules/ - plotting helpers, utilities and gene-list/panel parsing helpers
    • basic_plot.py, basic_bar_plot.py, compare_basic_plot.py - plotting factories
    • functions_server_helpers.py, read_genes_from_list_or_file_functions.py - utilities
    • panelapp/ - scripts to interact with PanelApp (CSV generation, comparison)

Development notes

  • To extend plots: add a factory under modules/ and register it in server logic
  • To add data sources: update data_loader.py and ensure column names match the plotting/metric code paths
  • Linting/tests: None included by default. Add unit tests for critical data parsing when making larger refactors.

Docker

  • The included Dockerfile builds a minimal image running the app on port 8080.

License & contact

  • See LICENSE for licensing terms.
  • For questions about data sources, interpretation, or contributions, contact the repository maintainers or open an issue.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors