Skip to content

Parallelize in_place_bit_reverse_permute#562

Draft
jotabulacios wants to merge 3 commits into
mainfrom
perf/parallel-bit-reverse
Draft

Parallelize in_place_bit_reverse_permute#562
jotabulacios wants to merge 3 commits into
mainfrom
perf/parallel-bit-reverse

Conversation

@jotabulacios
Copy link
Copy Markdown
Collaborator

No description provided.

@jotabulacios
Copy link
Copy Markdown
Collaborator Author

/bench

@github-actions
Copy link
Copy Markdown

Codex Code Review

Found one issue:

  • Low / Potential bug: crypto/math/src/fft/cpu/bit_reversing.rs:2 adds E: Send to the public in_place_bit_reverse_permute API even when the parallel feature is disabled, and even for small inputs that use the sequential fallback. The old function worked for any &mut [E]; this now breaks valid callers with non-Send element types such as Rc<_>. Keep the public sequential API unconstrained, or at least gate the bound with #[cfg(feature = "parallel")] and preserve the unconstrained signature for non-parallel builds.

No security issues found in the unsafe parallel swap logic given the documented power-of-two precondition.

I could not run cargo check because rustup tried to write under /home/runner/.rustup, which is read-only in this environment.

if br > i {
// SAFETY: (i, br) uniquely identifies this pair (smaller index is owner),
// so no two threads race on the same `ptr.0.add(k)` slot. Both indices
// are in-bounds since i < n and br < n.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: redundant alias

SendPtr<E> is Copy, so ptr is already freely copyable inside the closure — the let p = ptr; binding serves no purpose.

Suggested change
// are in-bounds since i < n and br < n.
unsafe {
core::ptr::swap(ptr.0.add(i), ptr.0.add(br));
}

for i in 0..input.len() {
let bit_reversed_index = reverse_index(i, input.len() as u64);
pub fn in_place_bit_reverse_permute<E: Send>(input: &mut [E]) {
let n = input.len();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: missing invariant assertion before unsafe code

The entire function — including the doc comment — requires n to be a power of two. Violating that causes reverse_index to produce out-of-bounds indices, which is a silent panic in the serial path but undefined behaviour in the parallel path (raw pointer dereference out of bounds).

All call-sites in polynomial.rs and bowers_fft.rs already guard with is_power_of_two() before calling this function, but a debug_assert here makes the safety invariant explicit and catches future misuse:

Suggested change
let n = input.len();
let n = input.len();
debug_assert!(n == 0 || n.is_power_of_two(), "input length must be a power of two");

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Review: Parallelize in_place_bit_reverse_permute

Overview: Adds a Rayon-based parallel fast path under #[cfg(feature = \"parallel\")] when the input exceeds 16 384 elements, using a SendPtr wrapper to share a raw pointer across threads.

Safety correctness ✓

The unsafe is sound. Bit-reverse permutation is a bijection, so each pair (i, br(i)) with i < br(i) appears exactly once in the iteration. Because br(br(i)) = i, the thread for index j = br(i) sees br(j) = i < j and skips the swap — so every memory location is written by at most one thread. The invariant is correctly stated in the SAFETY comment.

Issues

Medium — parallel path is never exercised by the test suite

PARALLEL_BITREV_THRESHOLD is 1 << 14 = 16 384. Every existing test (bit_reverse_permutation_works uses size 16; test_parallel_matches_sequential in bowers_fft_tests.rs uses size 1 024) stays well below the threshold. The new unsafe parallel branch has zero test coverage.

Suggested addition to the #[cfg(all(test, feature = "alloc"))] block (or a new #[cfg(all(test, feature = "parallel"))] block):

#[cfg(all(test, feature = "parallel"))]
#[test]
fn bit_reverse_permutation_parallel_matches_serial() {
    let n = 1 << 15; // above PARALLEL_BITREV_THRESHOLD
    let mut serial: Vec<usize> = (0..n).collect();
    let mut parallel: Vec<usize> = (0..n).collect();

    // serial baseline
    for i in 0..n {
        let br = reverse_index(i, n as u64);
        if br > i { serial.swap(i, br); }
    }

    in_place_bit_reverse_permute(&mut parallel);
    assert_eq!(serial, parallel);
}

Low — redundant alias and missing invariant assertion

See inline comments.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Benchmark — fib_iterative_8M (median of 5)

Table parallelism: auto (cores / 3)

Metric main PR Δ
Peak heap 56567 MB 58186 MB +1619 MB (+2.9%) ⚪
Prove time 26.636s 27.642s +1.006s (+3.8%) ⚪

✅ No significant change.

⚠️ Baseline heap spread: 7.6% (60341 MB / 57943 MB / 56039 MB / 56567 MB / 56400 MB) — comparison may be less reliable

Commit: 3f8beac · Baseline: built from main · Runner: self-hosted bench

@jotabulacios
Copy link
Copy Markdown
Collaborator Author

/bench 10

1 similar comment
@jotabulacios
Copy link
Copy Markdown
Collaborator Author

/bench 10

@jotabulacios
Copy link
Copy Markdown
Collaborator Author

/bench k=1

@jotabulacios
Copy link
Copy Markdown
Collaborator Author

/bench 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant