Skip to content

Introduce kv slo workload for userver#621

Open
Shfdis wants to merge 30 commits into
mainfrom
slo-refactor-v2
Open

Introduce kv slo workload for userver#621
Shfdis wants to merge 30 commits into
mainfrom
slo-refactor-v2

Conversation

@Shfdis
Copy link
Copy Markdown
Collaborator

@Shfdis Shfdis commented May 31, 2026

No description provided.

@Shfdis Shfdis added the SLO label May 31, 2026
Copy link
Copy Markdown

@robot-vibe-db robot-vibe-db Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review Summary

Verdict: \u274c 1 critical issue(s) found

Critical issues

  • Critical | Medium: Division by zero when maxId is 0 \u2014 tests/slo_workloads/userver/key_value/run.cpp:155

Other findings

  • Major | High: Signal handlers call non-async-signal-safe functions (TStringBuilder, Cout) \u2014 tests/slo_workloads/userver/key_value/run.cpp:130-138
  • Major | High: Blocking OS-level sleep (SleepUntil) inside userver coroutines starves worker threads \u2014 tests/slo_workloads/userver/key_value/run.cpp:186 (via TRpsProvider::Use())
  • Major | Medium: Unbounded task vector growth during long runs (600s x 1000 rps = ~600k handles) \u2014 tests/slo_workloads/userver/key_value/run.cpp:180
  • Minor | High: Uninitialized variable result \u2014 tests/slo_workloads/utils/utils_main.cpp:121
  • Minor | Medium: Division by zero in PrintStatistics when timePassed is 0 \u2014 tests/slo_workloads/utils/statistics.cpp:121 (pre-existing, not in diff)

This review was generated automatically. Critical issues require attention; other findings are advisory.
If this comment was useful, please give it a \U0001f44d \u2014 it helps us improve the review bot.

auto readRps = runOptions.Read_rps;
auto readTimeout = runOptions.ReadTimeout;
auto maxInfly = runOptions.CommonOptions.MaxInfly;
auto objectIdRange = static_cast<std::uint32_t>(maxId * 1.25);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity: Critical | Confidence: Medium\n\nIf maxId is 0 (table empty or initial population failed), objectIdRange becomes 0, and RandomNumber<std::uint32_t>() % objectIdRange on line 183 is a division by zero (undefined behavior, typically SIGFPE crash).\n\nSuggested fix: auto objectIdRange = std::max(1u, static_cast<std::uint32_t>(maxId * 1.25));

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its impossible

ShouldStop.store(false);
signal(SIGUSR1, [](int) {
Cout << TInstant::Now().ToRfc822StringLocal() << " SIGUSR1 handle" << Endl;
if (GlobalReporter) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity: Major | Confidence: High\n\nThe SIGUSR1 handler calls GlobalReporter->ShowProgress() which uses TStringBuilder, Cout, and accesses shared atomics/data. These are not async-signal-safe functions. The SIGINT handler on line 137 also uses Cerr. This is undefined behavior per POSIX.\n\nConsider limiting signal handlers to setting an atomic flag, then checking it in the main loop. Alternatively, use a self-pipe or signalfd to handle signals within the userver event loop.

std::uint32_t idToSelect = RandomNumber<std::uint32_t>() % objectIdRange;
const std::uint32_t objectIdKey = GetHash(idToSelect);

rpsProvider.Use();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity: Major | Confidence: High\n\nTRpsProvider::Use() (from utils.cpp) calls SleepUntil() which is a blocking OS-level sleep. This lambda runs inside userver::engine::AsyncNoSpan, so it's a userver coroutine. A blocking sleep blocks the entire worker thread instead of yielding the coroutine. With only 4 worker threads (hardcoded in userver_utils.cpp:141), the read and write RPS limiter loops will each block one thread, leaving only 2 threads for actual request dispatch.\n\nSuggested fix: Replace SleepUntil calls in the RPS limiter with userver::engine::SleepUntil or userver::engine::SleepFor so the coroutine yields instead of blocking the OS thread. This likely requires a userver-specific RPS provider implementation.


semaphore.lock_shared();

tasks.push_back(userver::engine::AsyncNoSpan(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity: Major | Confidence: Medium\n\nBoth the read loop (lines 180-208) and write loop (lines 256-283) push TaskWithResult<void> into a tasks vector without ever removing completed tasks during execution. They are only waited on at the end. For a 600-second run at 1000 read rps, this accumulates ~600,000 task handles in memory, each holding engine-internal state.\n\nConsider periodically draining completed tasks (e.g., batch-wait and clear every N iterations), or use a counter + semaphore pattern instead of accumulating handles.

StartStatCollecting(driver, statConfigFile);

TDatabaseOptions dbOptions{ driver, prefix };
int result;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity: Minor | Confidence: High\n\nint result; is uninitialized. While every code path through the switch sets it before use, some compilers will emit -Wmaybe-uninitialized warnings. The userver variant (userver_utils.cpp:133) correctly initializes this to EXIT_FAILURE.\n\nSuggested fix: int result = EXIT_FAILURE;

@robot-vibe-db
Copy link
Copy Markdown

robot-vibe-db Bot commented Jun 1, 2026

Full analysis log

Analysis performed by claude, claude-opus-4-6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants