Hi scikit-allel team,
I would like to clarify the exact definition and unit of the size parameter in the moving_* family of functions (e.g., moving_garud_h, moving_patterson_f3, and moving_haplotype_diversity).
###My Context
I have a simulated haplotype dataset representing a 5,000,000 bp region containing ~2000 SNPs.
My goal is to calculate H12 and f3 statistics across this region using 5 non-overlapping physical windows of 1,000,000 bp each.
My Question
If I use moving_garud_h(h, size=1000000):
- It seems
size is interpreted as the number of variants (array rows) rather than base pairs (bp). As a result, all 2000 SNPs are processed in the first window, and the subsequent windows are empty.
- This is different from
windowed_diversity where size takes base pairs.
Could you please confirm if size in moving_* functions indeed represents the variant count?
If so, is manually slicing the haplotype array by coordinate ranges the recommended way to get bp-based physical windows for Garud's H and Patterson's f3?
Thank you!
Hi scikit-allel team,
I would like to clarify the exact definition and unit of the
sizeparameter in themoving_*family of functions (e.g.,moving_garud_h,moving_patterson_f3, andmoving_haplotype_diversity).###My Context
I have a simulated haplotype dataset representing a 5,000,000 bp region containing ~2000 SNPs.
My goal is to calculate H12 and f3 statistics across this region using 5 non-overlapping physical windows of 1,000,000 bp each.
My Question
If I use
moving_garud_h(h, size=1000000):sizeis interpreted as the number of variants (array rows) rather than base pairs (bp). As a result, all 2000 SNPs are processed in the first window, and the subsequent windows are empty.windowed_diversitywheresizetakes base pairs.Could you please confirm if
sizeinmoving_*functions indeed represents the variant count?If so, is manually slicing the haplotype array by coordinate ranges the recommended way to get bp-based physical windows for Garud's H and Patterson's f3?
Thank you!