Clarification on size parameter in moving_garud_h (variant count vs physical bp distance)

Hi scikit-allel team,

I would like to clarify the exact definition and unit of the `size` parameter in the `moving_*` family of functions (e.g., `moving_garud_h`, `moving_patterson_f3`, and `moving_haplotype_diversity`).

###My Context
I have a simulated haplotype dataset representing a 5,000,000 bp region containing ~2000 SNPs. 
My goal is to calculate H12 and f3 statistics across this region using 5 non-overlapping physical windows of 1,000,000 bp each.

### My Question
If I use `moving_garud_h(h, size=1000000)`:
* It seems `size` is interpreted as the **number of variants (array rows)** rather than **base pairs (bp)**. As a result, all 2000 SNPs are processed in the first window, and the subsequent windows are empty.
* This is different from `windowed_diversity` where `size` takes base pairs.

Could you please confirm if `size` in `moving_*` functions indeed represents the variant count? 
If so, is manually slicing the haplotype array by coordinate ranges the recommended way to get bp-based physical windows for Garud's H and Patterson's f3?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on size parameter in moving_garud_h (variant count vs physical bp distance) #453

My Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification on size parameter in moving_garud_h (variant count vs physical bp distance) #453

Description

My Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions