A Shiny-for-Python web application to explore and compare distributions of ClinVar variants across different CADD PHRED-score thresholds, filter by gene lists or panels, and export per-gene/per-panel or filtered annotation summaries. The app is primarily intended for investigating the score distribution of known pathogenic and benign variants for different CADD PHRED-score thresholds.
This README explains the repository layout, how to run the app locally (pip/conda).
Highlights
- Interactive visualizations of CADD PHRED-score distributions
- Compare distributions across CADD/ClinVar versions and genome builds
- Per-gene filtering (paste a list or upload a file) and exportable summaries
- Per-panel filtering using panels from PanelApp and exportable summaries
- Python 3.10+ (3.12 recommended)
- See
requirements.txtorenvironment.ymlfor full dependencies - Docker (optional) — a
Dockerfileis included for containerized runs
The underlying data for the CADD-ThresholdApp needs to be downloaded, if the source code is downloaded as a package from bioconda or pip. The data can be downloaded here: . The data is also versionized seperately from the packages. You can also preprocess your own data for the website using this Snakemake workflow: CADD_threshold_analysis.
data/- contains preprocessed tables, panel summaries and metrics used by the app.paneldata/- CSVs summarizing panels and versions used by the UIpanel_metrics/- generated metrics stored by date/version
Notes:
- Large raw annotation files are typically not tracked in the repository. The app
expects prepared/normalized CSV inputs - use https://github.com/kircherlab/CADD_threshold_analysis to regenerate CSV inputs or use the
modules/panelapp/utilities if you need to regenerate panel CSVs from PanelApp. - if you choose to use your own data you need to make sure that the beginning of the file contains an identifier (e.g. GRCh37-v1.7 for our use case)
- you also need to edit the
VERSION_GR_CHOICESin theui_components.pyfile with you identifiers (and incalculate_panel_metrics_and_save.py) - additionally you need to change the file names in the
data_loader.pyand change column names that are being called (ClinicalSignificance,GeneName,PHRED,Genesetc.)
- if you want to update the Panel Data, you need to run
modules\panelapp\main_panelapp.py - If you run this, the Panel Overview will be updated and the old one will be saved as a backup, then the new metrics for all versions and genome releases will be calculated (Note: this takes several hours)
Using conda
conda create -n cadd_threshold_app -c bioconda -c conda-forge cadd-threshold-app
conda activate cadd_threshold_app
cadd-threshold-app --data </path/to/data>Using pip
pip install cadd-threshold-app
cadd-threshold-app --data </path/to/data>git clone https://github.com/kircherlab/CADD_threshold_app.git
cd CADD_threshold_app
pip install .
cadd-threshold-app --data dataInstall as package (editable, recommended for development)
pip install -e .Option A: run via the package entry point
This requires installing the project as a package (e.g. pip install -e .).
cadd-threshold-app --data </path/to/data>Alternatively to the cli option --data, you can set the CADD_THRESHOLD_APP_DATA_DIR environment variable.
export CADD_THRESHOLD_APP_DATA_DIR=data
cadd-threshold-appFurther CLI options are available to configute host and port - run cadd-threshold-app --help for details.
Option B: run from the repository root. Please set the CADD_THRESHOLD_APP_DATA_DIR environment variable to point to your data directory (e.g. data/ in the repository) before running.
export CADD_THRESHOLD_APP_DATA_DIR=data
python -m shiny run cadd_threshold_app.app:appThen open http://localhost:8080 in your browser.
app.py- Shiny app entrypoint and UI wiringserver_logic.py- main server-side reactive logic and handlersdata_loader.py- helpers to load and preprocess annotation tablesui_components.py- UImodules/- plotting helpers, utilities and gene-list/panel parsing helpersbasic_plot.py,basic_bar_plot.py,compare_basic_plot.py- plotting factoriesfunctions_server_helpers.py,read_genes_from_list_or_file_functions.py- utilitiespanelapp/- scripts to interact with PanelApp (CSV generation, comparison)
- To extend plots: add a factory under
modules/and register it in server logic - To add data sources: update
data_loader.pyand ensure column names match the plotting/metric code paths - Linting/tests: None included by default. Add unit tests for critical data parsing when making larger refactors.
- The included
Dockerfilebuilds a minimal image running the app on port 8080.
- See
LICENSEfor licensing terms. - For questions about data sources, interpretation, or contributions, contact the repository maintainers or open an issue.