Strainify is an accurate strain-level abundance analysis tool for short-read metagenomics.
conda install -c conda-forge -c bioconda strainify
conda activate strainify
strainify --helpgit clone https://github.com/treangenlab/Strainify.git
cd Strainify
# Create the conda environment (Linux / macOS)
conda env create -f environment.yml
conda activate strainify
# Run directly from the repository root
./strainify --helpmacOS (Apple Silicon) note:
parsnpandharvesttoolsdo not yet have nativeosx-arm64builds. Conda must resolve them through Rosetta 2 using theosx-64sub-architecture. Set this before creating the environment and keep it in your shell profile:export CONDA_SUBDIR=osx-64 conda env create -f environment.ymlTo persist across sessions, add
export CONDA_SUBDIR=osx-64to your~/.zshrc(or~/.bash_profile). Rosetta 2 must be installed (softwareupdate --install-rosetta).
Note: if installed via conda, replace ./strainify with strainify in the following commands.
./strainify \
--genome_folder path/to/genomes \
--fastq_folder path/to/fastqs \
--outdir results./strainify \
--genome_folder path/to/genomes \
--fastq_folder path/to/fastqs \
--read_type single \
--outdir resultsIf you suspect some reference genomes are absent from the metagenome, use filter-run. It runs
MAGNET to keep only the genomes called present in each sample, then runs Strainify on those
genomes — in a single command. Point --fastq_folder at a folder of sets (they share one
build-once index) or give a single set with --fastq1/--fastq2:
./strainify filter-run \
--genome_folder path/to/genomes \
--fastq_folder path/to/fastqs \
--magnet_ref_dir shared_panel \
--jobs 1 \
--outdir resultsEach sample's results go under results/<sample>/, with MAGNET's present/absent calls and the
genomes it kept under results/<sample>/magnet/<sample>/. Note: --magnet_ref_dir is optional — it
defaults to a path under the launch directory — but setting it is recommended so every set
reuses the same build-once index. --jobs is also optional and sets the number of samples to process concurrently (defaults to 1 if not set, i.e. samples are run serially). Each sample's heavy steps cap at ~min(12,max_cpus) cores, so set N near (cores / 12) to fill a big node without oversubscribing.
Strainify bundles a modified version of MAGNET (the
Treangen Lab's whole-genome read-mapping presence/absence caller), adapted for strain-level
filtering. Beyond a "bring-your-own-genomes" input mode and a build-once shared reference reused
across samples, it adds two presence-call refinements that are enabled by default in
filter-run:
- an absolute evidence gate that removes false positives a genome can otherwise earn from a saturated breadth ÷ expected-breadth ratio off only a handful of mapped reads (it downgrades present → absent unless there is enough absolute primary evidence), and
- a primary-evidence rescue that recovers genuinely-present low-abundance strains the
coverage-score threshold would otherwise drop, using uniquely-mapped (primary) read support
plus consensus ANI (it upgrades absent → present).
See
bin/magnet/README.mdfor the full description, attribution, and the--magnet_*parameters that tune presence sensitivity (and their defaults). If too few genomes pass (Strainify needs at least 2 references), loosen those thresholds — e.g. lower--magnet_min_covscoreor--magnet_rescue_min_reads.
If you already have filtered_variant_matrix.csv, reference.fna, and sites.txt from a
previous run, skip the parsnp step:
./strainify \
--genome_folder path/to/genomes \
--fastq_folder path/to/new_fastqs \
--use_precomputed_variants \
--precomputed_dir path/to/previous_results \
--outdir new_results| Parameter | Description | Default |
|---|---|---|
--genome_folder |
(required) Directory of reference genome FASTA files (.fna, .fa, .fasta) |
— |
--fastq_folder |
(required) Directory of FASTQ files. Paired-end: *_r1.fq[.gz] / *_r2.fq[.gz]. Single-end: *.fq[.gz] |
— |
--outdir |
Output directory | results |
--read_type |
paired or single |
paired |
--parsnp_flags |
Extra flags passed to parsnp | -c |
--window_size |
Window size for variant filtering (positive integer or average_LCB_length) |
500 |
--window_overlap |
Overlap fraction between windows (0–1) | 0 |
--filter_off |
Skip the recombination filter (recommended when genomes differ by <500 variants) | false |
--weight_by_entropy |
Weight variants by Shannon entropy when estimating abundances | false |
--bootstrap |
Compute 95% bootstrap confidence intervals | false |
--bootstrap_iterations |
Number of bootstrap iterations | 100 |
--use_precomputed_variants |
Skip parsnp/variant-filtering; use existing outputs | false |
--precomputed_dir |
Directory containing filtered_variant_matrix.csv, reference.fna, sites.txt |
— |
--prefilter |
Enable the MAGNET present/absent genome prefilter (set automatically by the filter-run command) |
false |
Strainify runs the tools from your active conda environment (built once from
environment.yml — see Installation), so a normal run needs no
-profile at all.
| Profile | Description |
|---|---|
standard |
Local execution from your active environment (default; applied automatically) |
test |
Runs against the bundled example/ data |
# Cap to 8 CPUs and 64 GB RAM
./strainify ... --max_cpus 8 --max_memory 64.GBFor step-by-step instructions using the example data, see:
Run the test profile to verify your installation (from your activated environment):
NXF_ANSI_LOG=false ./strainify -profile testSee the scripts and documentation at: https://github.com/treangenlab/Strainify_paper
Strainify: Strain-Level Microbiome Profiling for Low-Coverage Short-Read Metagenomic Datasets
https://www.biorxiv.org/content/10.1101/2025.10.10.681738v2
For questions or suggestions, open an issue or contact Rossie Luo at rl152@rice.edu.

