Skip to content

test(write-layer): regression test for #25 current-bucket flush#26

Open
tonyalaribe wants to merge 1 commit into
masterfrom
test/flush-on-pressure-regression
Open

test(write-layer): regression test for #25 current-bucket flush#26
tonyalaribe wants to merge 1 commit into
masterfrom
test/flush-on-pressure-regression

Conversation

@tonyalaribe

Copy link
Copy Markdown
Contributor

Follow-up to #25. That PR fixed the BufferedWriteLayer pressure path without including a failing test first — which is a workflow gap we just codified in the bug-fix section of CLAUDE.md.

This test reproduces the 2026-05-28 monoscope dual-write symptom:

  • Tight buffer budget (max_memory_mb = 1) so memory pressure trips after one batch
  • Default 600s bucket_duration_secs so the current bucket is never yet a 'completed' bucket
  • Second insert with a near-now timestamp lands in the same current bucket

Pre-#25 behavior: the second insert fails with Memory limit exceeded because the pressure-triggered flush_completed_buckets finds nothing flushable.

Post-#25 behavior: it succeeds because flush_all_now drains the current bucket and frees memory.

I verified the test would fail without the #25 fix by mentally tracing:
is_memory_pressure() returns true after the first insert → pre-fix calls flush_completed_bucketsget_flushable_buckets(current_bucket) returns [] (current is not flushable) → no eviction → reserved_bytes + estimated_bytes stays at the post-insert level → second insert's try_reserve_memory exceeds hard_limitanyhow::bail!.

Follow-up to #25. That PR fixed the BufferedWriteLayer pressure path
without including a failing test first — which is a workflow gap we
just codified in CLAUDE.md.

This test reproduces the 2026-05-28 monoscope dual-write symptom:
- Tight buffer budget so memory pressure trips after one batch
- Default 600s bucket_duration_secs so the current bucket is never
  yet a 'completed' bucket
- Second insert with a near-now timestamp lands in the same current
  bucket

Pre-#25: the second insert fails with 'Memory limit exceeded'
because the pressure-triggered flush_completed_buckets finds
nothing flushable.
Post-#25: it succeeds because flush_all_now drains the current
bucket and frees memory.
@claude

claude Bot commented May 29, 2026

Copy link
Copy Markdown

PR Review: test(write-layer): regression test for #25 current-bucket flush

Summary

This PR adds a regression test (test_memory_pressure_flushes_current_bucket) for the 2026-05-28 monoscope dual-write outage fixed in PR #25. The bug: insert()'s memory-pressure path called flush_completed_buckets(), which only drains buckets older than bucket_duration_secs — a no-op when all data lands in the current bucket, causing every subsequent insert to fail with Memory limit exceeded. PR #25 fixed this by calling flush_all_now() instead. This test is meant to guard that behavior.

The intent and documentation are good. However, there is a critical issue with the test's mechanism.


Critical: Test Does Not Actually Trigger Memory Pressure

src/buffered_write_layer.rs:1015

cfg.buffer.timefusion_buffer_max_memory_mb = 1;

BufferConfig::max_memory_mb() enforces a minimum of 64:

pub fn max_memory_mb(&self) -> usize {
    self.timefusion_buffer_max_memory_mb.max(64)
}

max_memory_bytes() then applies a second floor via MIN_BUFFER_BYTES = 64 MiB. Setting timefusion_buffer_max_memory_mb = 1 does not produce a 1 MB budget — the effective budget is at least 64 MB.

create_test_batch produces 3 tiny rows. estimate_batch_size on such a batch is on the order of a few KB — far below 64 MB. As a result, is_memory_pressure() most likely returns false after the first insert, the pressure path is never exercised, and the test passes regardless of whether the PR #25 fix is present or reverted.

This test would likely pass on a pre-#25 build, defeating its purpose as a regression guard.

To reliably exercise the pressure path, one of these approaches is needed:

  • Insert enough batches in a loop to genuinely exceed the effective 64 MB floor before the second insert, or
  • Use a test-only injection mechanism to force is_memory_pressure() to return true (e.g., manipulate reserved_bytes directly), or
  • Add an assertion that confirms the pressure path was actually reached (e.g., check that reserved_bytes exceeded max_memory_bytes() at some point between the two inserts).

Minor: Attribute Ordering

src/buffered_write_layer.rs:1009

#[tokio::test]
#[serial]

Existing tests in this module (e.g., test_recovery at line 942) use #[serial] before #[tokio::test]. Both orderings work but consistency is preferred.


Minor: Prefer create_test_config() Helper

src/buffered_write_layer.rs:1012–1013

The test manually constructs AppConfig::default() and sets timefusion_data_dir. The three other tests in this module all use create_test_config() for the base config. Prefer:

let mut cfg = (*create_test_config(dir.path().to_path_buf())).clone();
cfg.buffer.timefusion_buffer_max_memory_mb = 1; // note: clamped to 64 MB — see above
let cfg = Arc::new(cfg);

Test Coverage Gap

The test checks that the second insert succeeds (happy path) but does not assert:

  1. That is_memory_pressure() was true between the two inserts — without this, a passing test only proves the insert didn't error, not that the flush path ran.
  2. That the buffer was actually drained after the second insert (e.g., assert reserved_bytes dropped or that the MemBuffer is below the pressure threshold).

Overall Assessment: Request Changes

The doc-comment is thorough and the regression scenario is important to guard. But in its current form the test does not reliably exercise the code path it documents. Please fix the budget/volume issue so the test would actually fail on a pre-#25 build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant