Skip to content

Commit 8408f02

Browse files
committed
update readme and help
1 parent e6c107d commit 8408f02

2 files changed

Lines changed: 77 additions & 72 deletions

File tree

README.md

Lines changed: 74 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# TronFlow VCF postprocessing
22

3-
![GitHub tag (latest SemVer)](https://img.shields.io/github/v/release/tron-bioinformatics/tronflow-variant-normalization?sort=semver)
4-
[![Run tests](https://github.com/TRON-Bioinformatics/tronflow-variant-normalization/actions/workflows/automated_tests.yml/badge.svg?branch=master)](https://github.com/TRON-Bioinformatics/tronflow-variant-normalization/actions/workflows/automated_tests.yml)
3+
![GitHub tag (latest SemVer)](https://img.shields.io/github/v/release/tron-bioinformatics/tronflow-vcf-postprocessing?sort=semver)
4+
[![Run tests](https://github.com/TRON-Bioinformatics/tronflow-vcf-postprocessing/actions/workflows/automated_tests.yml/badge.svg?branch=master)](https://github.com/TRON-Bioinformatics/tronflow-vcf-postprocessing/actions/workflows/automated_tests.yml)
55
[![DOI](https://zenodo.org/badge/372133189.svg)](https://zenodo.org/badge/latestdoi/372133189)
66
[![License](https://img.shields.io/badge/license-MIT-green)](https://opensource.org/licenses/MIT)
77
[![Powered by Nextflow](https://img.shields.io/badge/powered%20by-Nextflow-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://www.nextflow.io/)
@@ -18,6 +18,78 @@ This pipeline has several objectives:
1818
* Technical annotations from different BAM files
1919
* Functional annotations
2020

21+
## How to run it
22+
23+
Run it from GitHub as follows:
24+
```
25+
nextflow run tron-bioinformatics/tronflow-vcf-postprocessing -r v2.1.0 -profile conda --input_vcfs input_vcfs --reference reference.fasta
26+
```
27+
28+
Otherwise download the project and run as follows:
29+
```
30+
nextflow main.nf -profile conda --input_vcfs input_vcfs --reference reference.fasta
31+
```
32+
33+
Find the help as follows:
34+
```
35+
$ nextflow run tron-bioinformatics/tronflow-vcf-postprocessing --help
36+
37+
TronFlow VCF normalization v${VERSION}
38+
39+
Usage:
40+
nextflow run main.nf --input_vcfs input_vcfs --reference reference.fasta
41+
42+
43+
Input:
44+
* --input_vcf: the path to a single VCF to normalize (not compatible with --input_files)
45+
* --input_vcfs: the path to a tab-separated values file containing in each row the sample name and path to the VCF file (not compatible with --input_vcf)
46+
The input file does not have header!
47+
Example input file:
48+
sample1 /path/to/your/file.vcf
49+
sample2 /path/to/your/file2.vcf
50+
* --reference: path to the FASTA genome reference (indexes expected *.fai, *.dict)
51+
* --vcf-without-ad: indicate when the VCFs to normalize do not have the FORMAT/AD annotation
52+
53+
Optional input:
54+
* --output: the folder where to publish output
55+
* --skip_decompose_complex: flag indicating not to split complex variants (ie: MNVs and combinations of SNVs and indels)
56+
* --filter: specify the filter to apply if any (e.g.: PASS), only variants with this value will be kept
57+
* --input_bams: a tab-separated values file containing in each row the sample name, tumor and normal BAM files for annotation with Vafator
58+
* --snpeff_organism: the SnpEff organism name (eg: hg19, hg38, GRCh37.75, GRCh38.99)
59+
* --snpeff_datadir: the SnpEff data folder where the reference genomes were previously downloaded. Required if --snpeff_organism is provided
60+
* --snpeff_args: additional SnpEff arguments
61+
62+
Output:
63+
* Normalized VCF file
64+
* Tab-separated values file with the absolute paths to the normalized VCF files, normalized_vcfs.txt
65+
* Summary statistics before and after normalization
66+
```
67+
68+
### Input tables
69+
70+
The table with VCF files expects two tab-separated columns without a header
71+
72+
| Patient name | VCF |
73+
|----------------------|------------------------------------------------------------------------|
74+
| patient_1 | /path/to/patient_1.vcf |
75+
| patient_2 | /path/to/patient_2.vcf |
76+
77+
The optional table with BAM files expects two tab-separated columns without a header.
78+
79+
| Patient name | Sample name:BAM |
80+
|----------------------|---------------------------------|
81+
| patient_1 | primary_tumor:/path/to/sample_1.primary.bam |
82+
| patient_1 | metastasis_tumor:/path/to/sample_1.metastasis.bam |
83+
| patient_1 | normal:/path/to/sample_1.normal.bam |
84+
| patient_2 | primary_tumor:/path/to/sample_1.primary_1.bam |
85+
| patient_2 | primary_tumor:/path/to/sample_1.primary_2.bam |
86+
| patient_2 | metastasis_tumor:/path/to/sample_1.metastasis.bam |
87+
| patient_2 | normal:/path/to/sample_1.normal.bam |
88+
89+
Each patient can have any number of samples. Any sample can have any number of BAM files, annotations from the
90+
different BAM files of the same sample will be provided with suffixes _1, _2, etc.
91+
The aggregated vafator annotations on each sample will also be provided without a suffix.
92+
2193
## Variant filtering
2294

2395
Optionally, only variants with the value in the column `FILTER` matching the value of parameter `--filter` are kept.
@@ -163,76 +235,6 @@ To provide any additional SnpEff arguments use `--snpeff_args` such as
163235
otherwise defaults will be used.
164236

165237

166-
## How to run it
167-
168-
Run it from GitHub as follows:
169-
```
170-
nextflow run tron-bioinformatics/tronflow-variant-normalization -r v2.0.0 -profile conda --input_vcfs input_vcfs --reference reference.fasta
171-
```
172-
173-
Otherwise download the project and run as follows:
174-
```
175-
nextflow main.nf -profile conda --input_vcfs input_vcfs --reference reference.fasta
176-
```
177-
178-
Find the help as follows:
179-
```
180-
$ nextflow run tron-bioinformatics/tronflow-variant-normalization --help
181-
182-
TronFlow VCF normalization v${VERSION}
183-
184-
Usage:
185-
nextflow run main.nf --input_vcfs input_vcfs --reference reference.fasta
186-
187-
188-
Input:
189-
* --input_vcf: the path to a single VCF to normalize (not compatible with --input_files)
190-
* --input_vcfs: the path to a tab-separated values file containing in each row the sample name and path to the VCF file (not compatible with --input_vcf)
191-
The input file does not have header!
192-
Example input file:
193-
sample1 /path/to/your/file.vcf
194-
sample2 /path/to/your/file2.vcf
195-
* --reference: path to the FASTA genome reference (indexes expected *.fai, *.dict)
196-
* --vcf-without-ad: indicate when the VCFs to normalize do not have the FORMAT/AD annotation
197-
198-
Optional input:
199-
* --output: the folder where to publish output
200-
* --skip_decompose_complex: flag indicating not to split complex variants (ie: MNVs and combinations of SNVs and indels)
201-
* --filter: specify the filter to apply if any (e.g.: PASS), only variants with this value will be kept
202-
* --input_bams: a tab-separated values file containing in each row the sample name, tumor and normal BAM files for annotation with Vafator
203-
204-
Output:
205-
* Normalized VCF file
206-
* Tab-separated values file with the absolute paths to the normalized VCF files, normalized_vcfs.txt
207-
* Summary statistics before and after normalization
208-
```
209-
210-
### Input tables
211-
212-
The table with VCF files expects two tab-separated columns without a header
213-
214-
| Patient name | VCF |
215-
|----------------------|------------------------------------------------------------------------|
216-
| patient_1 | /path/to/patient_1.vcf |
217-
| patient_2 | /path/to/patient_2.vcf |
218-
219-
The optional table with BAM files expects two tab-separated columns without a header.
220-
221-
| Patient name | Sample name:BAM |
222-
|----------------------|---------------------------------|
223-
| patient_1 | primary_tumor:/path/to/sample_1.primary.bam |
224-
| patient_1 | metastasis_tumor:/path/to/sample_1.metastasis.bam |
225-
| patient_1 | normal:/path/to/sample_1.normal.bam |
226-
| patient_2 | primary_tumor:/path/to/sample_1.primary_1.bam |
227-
| patient_2 | primary_tumor:/path/to/sample_1.primary_2.bam |
228-
| patient_2 | metastasis_tumor:/path/to/sample_1.metastasis.bam |
229-
| patient_2 | normal:/path/to/sample_1.normal.bam |
230-
231-
Each patient can have any number of samples. Any sample can have any number of BAM files, annotations from the
232-
different BAM files of the same sample will be provided with suffixes _1, _2, etc.
233-
The aggregated vafator annotations on each sample will also be provided without a suffix.
234-
235-
236238
## References
237239

238240
* Adrian Tan, Gonçalo R. Abecasis and Hyun Min Kang. Unified Representation of Genetic Variants. Bioinformatics (2015) 31(13): 2202-2204](http://bioinformatics.oxfordjournals.org/content/31/13/2202) and uses bcftools [Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (Oxford, England), 27(21), 2987–2993. 10.1093/bioinformatics/btr509

nextflow.config

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ Optional input:
6363
* --filter: specify a comma-separated list of filters to apply (e.g.: PASS,.), only variants with these values will be kept. If not provided all varianst are kept
6464
* --vcf-without-ad: indicate when the VCFs to normalize do not have the FORMAT/AD annotation
6565
* --input_bams: a tab-separated values file containing in each row the sample name, tumor and normal BAM files for annotation with Vafator
66+
* --snpeff_organism: the SnpEff organism name (eg: hg19, hg38, GRCh37.75, GRCh38.99)
67+
* --snpeff_datadir: the SnpEff data folder where the reference genomes were previously downloaded. Required if --snpeff_organism is provided
68+
* --snpeff_args: additional SnpEff arguments
6669
6770
Output:
6871
* Normalized VCF file

0 commit comments

Comments
 (0)