Skip to content
This repository was archived by the owner on Mar 10, 2023. It is now read-only.

PSEUDO PR - DO NOT MERGE#1

Draft
ewels wants to merge 13 commits into
initial_commitfrom
dev
Draft

PSEUDO PR - DO NOT MERGE#1
ewels wants to merge 13 commits into
initial_commitfrom
dev

Conversation

@ewels
Copy link
Copy Markdown
Member

@ewels ewels commented Dec 20, 2021

DO NOT MERGE THIS PULL REQUEST

This PR is a pseudo-PR designed to be used for a community review of the pipeline prior to first release. Feel free to add comments and review content here - as new commits are pushed to dev the PR will update automatically.

@ewels ewels added the WIP Work in progress label Dec 20, 2021
Copy link
Copy Markdown

@mashehu mashehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first look through it. Good work! Some smaller things and then I'll give my 👍🏻. The output docs could use a bit more love (it is not so clear to me what the important files are, from where in the workflow they come, some more explanation of the SSDS plots in the multiqc report would be helpful).

Comment thread CITATIONS.md

* [Trimgalore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)

* [BWA](https://github.com/lh3/bwa)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check for these tools if there is really no citeable publication. BWA for example lists some in their readme.

Comment thread README.md
## Introduction

<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->
**nf-core/ssds** is a bioinformatics best-practice analysis pipeline for Single-Stranded DNA Sequencing (SSDS) pipeline.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**nf-core/ssds** is a bioinformatics best-practice analysis pipeline for Single-Stranded DNA Sequencing (SSDS) pipeline.
**nf-core/ssds** is a bioinformatics best-practice analysis pipeline for Single-Stranded DNA Sequencing (SSDS).

Comment thread conf/base.config
}

withName:SAMPLESHEET_CHECK {
executor = 'local'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this executor?

Comment thread conf/base.config
}

withName:PICARD_SORTSAM_QRY {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you use the cpu and memory specifications from the process_* labels?

Comment thread conf/mod.config
@@ -0,0 +1,121 @@
/*
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a remnant of an older nf-core template? Should be merged with modules.config imo.

Comment thread nextflow_schema.json
},
"parse_extra_types": {
"type": "boolean",
"description": "SSDS identifies ssDNA using the micro-homology structure at the 5' end of reads. When set to true, the SSDS pipeline also infers three additional, but low confidence fragment types; importantly, reads of these thress types are a best guess, but could be incorrect (i.e. fragments classified as dsDNA may instead come from ssDNA in a sufficiently enriched library).",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "SSDS identifies ssDNA using the micro-homology structure at the 5' end of reads. When set to true, the SSDS pipeline also infers three additional, but low confidence fragment types; importantly, reads of these thress types are a best guess, but could be incorrect (i.e. fragments classified as dsDNA may instead come from ssDNA in a sufficiently enriched library).",
"description": "SSDS identifies ssDNA using the micro-homology structure at the 5' end of reads. When set to true, the SSDS pipeline also infers three additional, but low confidence fragment types; importantly, reads of these three types are a best guess, but could be incorrect (i.e. fragments classified as dsDNA may instead come from ssDNA in a sufficiently enriched library).",

Comment thread nextflow_schema.json
"format": "file-path",
"mimetype": "text/plain",
"description": "Path to the BWA index folder.",
"help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the save_reference parameter implemented in your workflow. Either update this section (and the one for --fasta) or don't mention it here.

@@ -0,0 +1,42 @@
//
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this file a left over and can be deleted? 🙂

ch_versions = ch_versions.mix(UCSC_BEDGRAPHTOBIGWIG.out.versions.first())

emit:
bigwig = UCSC_BEDGRAPHTOBIGWIG.out.bigwig // channel: [ val(meta), [ bai ] ]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bigwig = UCSC_BEDGRAPHTOBIGWIG.out.bigwig // channel: [ val(meta), [ bai ] ]
bigwig = UCSC_BEDGRAPHTOBIGWIG.out.bigwig // channel: [ val(meta), [ bigwig ] ]

emit:
bam = PICARD_MARKDUPLICATES.out.bam // channel: [ val(meta), [ bam ] ]
bai = SAMTOOLS_INDEX.out.bai // channel: [ val(meta), [ bai ] ]
mdmetrics = PICARD_MARKDUPLICATES.out.metrics // channel: [ val(meta), [ bam ] ]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mdmetrics = PICARD_MARKDUPLICATES.out.metrics // channel: [ val(meta), [ bam ] ]
mdmetrics = PICARD_MARKDUPLICATES.out.metrics // channel: [ val(meta), [ metrics ] ]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

WIP Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants