Skip to content

Clarification on size parameter in moving_garud_h (variant count vs physical bp distance) #453

@GSYH

Description

@GSYH

Hi scikit-allel team,

I would like to clarify the exact definition and unit of the size parameter in the moving_* family of functions (e.g., moving_garud_h, moving_patterson_f3, and moving_haplotype_diversity).

###My Context
I have a simulated haplotype dataset representing a 5,000,000 bp region containing ~2000 SNPs.
My goal is to calculate H12 and f3 statistics across this region using 5 non-overlapping physical windows of 1,000,000 bp each.

My Question

If I use moving_garud_h(h, size=1000000):

  • It seems size is interpreted as the number of variants (array rows) rather than base pairs (bp). As a result, all 2000 SNPs are processed in the first window, and the subsequent windows are empty.
  • This is different from windowed_diversity where size takes base pairs.

Could you please confirm if size in moving_* functions indeed represents the variant count?
If so, is manually slicing the haplotype array by coordinate ranges the recommended way to get bp-based physical windows for Garud's H and Patterson's f3?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions