FastaGuard is a fast, explainable FASTA QC tool for validating assembly FASTA files before expensive downstream analysis.
The assembly FASTA gate before expensive QC.
It is not intended to compete with QUAST, BUSCO, BlobToolKit, FastQC, or MultiQC. FastaGuard is the earlier preflight and triage layer: the first command that answers whether a FASTA file is valid, sane, interpretable, and ready for downstream tools.
Before QUAST. Before BUSCO. Before BlobToolKit. Before annotation.
Run FastaGuard first.
Recommended bioinformatics install:
mamba install -c conda-forge -c bioconda fastaguardVerify the installed CLI:
fastaguard --version
fastaguard --schemaGitHub release binaries are also available for Linux and macOS:
tar -xzf fastaguard-v0.2.0-x86_64-unknown-linux-gnu.tar.gz
./fastaguard-v0.2.0-x86_64-unknown-linux-gnu/fastaguard --helpThe v0.2.0 GitHub release binaries and source archive are published. Bioconda serves v0.2.0 for Linux x86_64, Linux ARM64, macOS Intel, and macOS Apple Silicon.
Local development build:
cargo build --release --lockedRun the assembly preflight check:
fastaguard sample.fa \
--profile assembly \
--out fastaguard_report.html \
--json fastaguard.json \
--tsv fastaguard.tsv \
--multiqc fastaguard_mqc.jsonPipeline gate example:
fastaguard sample.fa --profile assembly --gate pipelineThe pipeline gate is the v0.3 assembly preset for workflow stop/go decisions.
It fails on duplicate IDs, invalid characters, invalid FASTA structure, and
high-N content. GC and length outliers remain advisory by default because they
are routing signals, not proof of contamination or misassembly. To make an
advisory finding block a pipeline, add it explicitly with --fail-on.
Inspect the machine-readable contract:
fastaguard --schema
fastaguard --finding-catalog
fastaguard --explain-finding high_n_rateBuild and run the local Docker image:
docker build -t fastaguard:local .
docker run --rm -v "$PWD:/data" fastaguard:local /data/sample.fa \
--profile assembly \
--out /data/fastaguard_report.html \
--json /data/fastaguard.json \
--tsv /data/fastaguard.tsv \
--multiqc /data/fastaguard_mqc.jsonPublished BioContainers currently provides the v0.2 image, which does not include v0.3 gate behavior yet:
docker pull quay.io/biocontainers/fastaguard:0.2.0--hfa8f182_0Exit codes:
0 = pass
1 = warnings above configured threshold
2 = hard QC failure
3 = invalid input / tool error
FASTA files are everywhere, but FASTA QC is fragmented across ad hoc scripts, seqkit stats, assembly QC tools, completeness tools, contamination workflows, and pipeline-specific checks. Each is useful, but none is the simple default first command for:
Is this FASTA file valid, sane, interpretable, and ready for downstream tools?
FastaGuard fills that gap:
FastaGuard is a fast, explainable FASTA QC tool that validates assembly FASTA files, detects structural and composition red flags, and produces pipeline-ready reports before expensive downstream analysis.
FastaGuard is assembly-first.
fastaguard sample.fa \
--profile assembly \
--gate pipeline \
--out fastaguard_report.html \
--json fastaguard.json \
--tsv fastaguard.tsv \
--multiqc fastaguard_mqc.jsonThe MVP focuses on:
- FASTA validity
- invalid FASTA structure reports with explainable FAIL verdicts
- duplicate IDs
- duplicate sequences
- invalid nucleotide/IUPAC characters
- empty records
- core assembly stats
- N50, N90, L50, L90
- GC, AT, N, and ambiguity rates
- high-N scaffolds
- gap runs
- suspicious tiny contigs
- explainable PASS / WARN / FAIL verdicts
- machine-readable summaries, actions, scope, and provenance
- stable JSON, TSV, HTML, and MultiQC-compatible outputs
- length histogram and GC-vs-length plot data in JSON and HTML
v0.2 expands the assembly preflight layer with:
- composition outliers
- richer provenance, taxonomy context, and routing hints
- hardened MultiQC and pipeline adoption material
v0.3 adds the assembly gate contract:
--gate pipelinefor default workflow blocking behaviorgate.blocking_findingsfor machine stop/go decisions- checksum provenance with
provenance.input_sha256 - explicit advisory findings for evidence that should route follow-up QC rather than stop a pipeline by default
FastaGuard should recommend deeper tools when they are appropriate:
- QUAST for assembly quality evaluation
- BUSCO for biological completeness
- BlobToolKit for contamination and cobiont exploration
- CheckM for microbial genome completeness and contamination
- seqkit for ad hoc sequence operations
The strategic wedge is earlier:
FastaGuard catches FASTA-level assembly problems before expensive assembly QC.
- Example reports
- Product thesis
- Vision plan
- MVP spec
- Output contract
- Tool landscape
- Adoption plan
- LLM and tooling vision
- Benchmarking
- v0.2 evidence pack
- Packaging
- v0.2.0 release notes
- v0.1.1 release notes
- v0.1.0 release notes
- Roadmap
- First-release design
v0.2.0 is published on GitHub with Linux and macOS release binaries. Bioconda
serves v0.2.0 for linux-64, linux-aarch64, osx-64, and osx-arm64.
BioContainers also publishes the pinned workflow image
quay.io/biocontainers/fastaguard:0.2.0--hfa8f182_0.
The current development milestone is v0.3: evidence, checksum provenance, and the assembly gate contract. Published Bioconda and BioContainers packages remain v0.2.0 until a v0.3 release is cut.