Skip to content

Commit d98ebe4

Browse files
committed
document bcftools csq
1 parent 10cdb58 commit d98ebe4

3 files changed

Lines changed: 28 additions & 10 deletions

File tree

README.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Find the help as follows:
3636
```
3737
$ nextflow run tron-bioinformatics/tronflow-vcf-postprocessing --help
3838
39-
TronFlow VCF normalization v${VERSION}
39+
TronFlow VCF postprocessing v${VERSION}
4040
4141
Usage:
4242
nextflow run main.nf --input_vcfs input_vcfs --reference reference.fasta
@@ -67,7 +67,8 @@ Optional input:
6767
Example input file:
6868
sample1 primary:3
6969
sample1 metastasis:/path/to/metastasis.local_clonalities.bed
70-
* --reference: path to the FASTA genome reference (indexes expected *.fai, *.dict) [required for normalization]
70+
* --reference: absolute path to the FASTA genome reference (indexes expected *.fai, *.dict) [required for normalization and for functional annotation with BCFtools]
71+
* --gff: absolute path to a GFF gene annotations file [required for functional annotation with BCFtools, only Ensembl-like GFF files]
7172
* --vcf-without-ad: indicate when the VCFs to normalize do not have the FORMAT/AD annotation
7273
* --output: the folder where to publish output
7374
* --skip_normalization: flag indicating to skip all normalization steps
@@ -269,7 +270,10 @@ No technical annotations are performed if the parameter `--input_bams` is not pa
269270
## Functional annotations
270271

271272
The functional annotations provide a biological context for every variant. Such as the overlapping genes or the effect
272-
of the variant in a protein. These annotations are provided by SnpEff (Cingolani, 2012).
273+
of the variant in a protein. These annotations are provided by SnpEff (Cingolani, 2012) or by BCFtools csq (Danecek, 2017).
274+
Only one of the previous can be used.
275+
276+
### Using SnpEff
273277

274278
The SnpEff available human annotations are:
275279
* GRCh37.75
@@ -289,7 +293,15 @@ To provide any additional SnpEff arguments use `--snpeff_args` such as
289293
`--snpeff_args "-noStats -no-downstream -no-upstream -no-intergenic -no-intron -onlyProtein -hgvs1LetterAa -noShiftHgvs"`,
290294
otherwise defaults will be used.
291295

292-
No functional annotations are performed if the parameters `--snpeff_organism` and `--snpeff_datadir` are not passed.
296+
No SnpEff functional annotations are performed if the parameters `--snpeff_organism` and `--snpeff_datadir` are not passed.
297+
298+
### Using BCFtools csq
299+
300+
BCFtools does not require any previous preparation. It expects two parameters:
301+
* `--reference`: absolute path to the FASTA reference genome
302+
* `--gff`: absolute path to the Ensembl-like GFF annotations (ie: Gencode GFF files do not work https://github.com/samtools/bcftools/issues/1078)
303+
304+
Importantly, BCFtools does use the available phasing information to evaluate all mutations affecting any given transcript together.
293305

294306

295307
## References
@@ -298,3 +310,4 @@ No functional annotations are performed if the parameters `--snpeff_organism` an
298310
* Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819.
299311
* Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. 10.1038/nbt.3820
300312
* Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.". Fly (Austin). 2012 Apr-Jun;6(2):80-92. PMID: 22728672
313+
* Danecek, P., & McCarthy, S. A. (2017). BCFtools/csq: haplotype-aware variant consequences. Bioinformatics, 33(13), 2037–2039. https://doi.org/10.1093/bioinformatics/btx100

main.nf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ if ( params.snpeff_organism && ! params.snpeff_datadir) {
4343
exit 1, "To run snpEff, please, provide your snpEff data folder with --snpeff_datadir"
4444
}
4545

46+
if (params.snpeff_organism && params.gff) {
47+
exit 1, "Please use either SnpEff (--snpeff_organism) or BCFtools csq (--gff), but not both"
48+
}
49+
4650
if (params.skip_normalization && ! params.input_bams && ! params.snpeff_organism) {
4751
exit -1, "Neither normalization, VAFator annotation or SnpEff annotation enabled! Nothing to do..."
4852
}

nextflow.config

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,15 @@ manifest {
4040
}
4141

4242
params.help_message = """
43-
TronFlow variant normalization v${VERSION}
43+
TronFlow VCF postprocessing v${VERSION}
4444
4545
Usage:
46-
nextflow run main.nf -profile conda --input_vcfs input_vcfs --reference reference.fasta
46+
nextflow run main.nf --input_vcfs input_vcfs --reference reference.fasta
4747
4848
4949
Input:
5050
* --input_vcf: the path to a single VCF to normalize (not compatible with --input_files)
51-
* --input_vcfs: a tab-separated values file containing in each row the sample name and path to the VCF file (not compatible with --input_vcf)
51+
* --input_vcfs: the path to a tab-separated values file containing in each row the sample name and path to the VCF file (not compatible with --input_vcf)
5252
The input file does not have header!
5353
Example input file:
5454
sample1 /path/to/your/file.vcf
@@ -71,12 +71,13 @@ Optional input:
7171
Example input file:
7272
sample1 primary:3
7373
sample1 metastasis:/path/to/metastasis.local_clonalities.bed
74-
* --reference: path to the FASTA genome reference (indexes expected *.fai, *.dict) [required for normalization]
74+
* --reference: absolute path to the FASTA genome reference (indexes expected *.fai, *.dict) [required for normalization and for functional annotation with BCFtools]
75+
* --gff: absolute path to a GFF gene annotations file [required for functional annotation with BCFtools, only Ensembl-like GFF files]
76+
* --vcf-without-ad: indicate when the VCFs to normalize do not have the FORMAT/AD annotation
7577
* --output: the folder where to publish output
7678
* --skip_normalization: flag indicating to skip all normalization steps
7779
* --skip_decompose_complex: flag indicating not to split complex variants (ie: MNVs and combinations of SNVs and indels)
78-
* --filter: specify a comma-separated list of filters to apply (e.g.: PASS,.), only variants with these values will be kept. If not provided all varianst are kept
79-
* --vcf-without-ad: indicate when the VCFs to normalize do not have the FORMAT/AD annotation
80+
* --filter: specify the filter to apply if any (e.g.: PASS), only variants with this value will be kept
8081
* --input_bams: a tab-separated values file containing in each row the sample name, tumor and normal BAM files for annotation with Vafator
8182
* --skip_multiallelic_filter: after VAFator annotations if any multiallelic variant is present (ie: two different
8283
mutations in the same position) only the highest VAF variant is kept unless this flag is passed

0 commit comments

Comments
 (0)