Skip to content

DB row lock contention in organizationonboardingtask on cache miss #4373

Description

@vitka

Environment

self-hosted (https://develop.sentry.dev/self-hosted/)

Steps to Reproduce

OrganizationOnboardingTaskManager.record() can create severe PostgreSQL row-lock contention when many workers miss the onboarding-task cache for the same (organization_id, task) key at the same time.

In my case it produced a thundering herd of workers queued on the same small set of sentry_organizationonboardingtask rows. My self-hosted Sentry hit a production database incident where queries like SELECT ... FROM sentry_organizationonboardingtask ... FOR UPDATE dominated PostgreSQL master. It consumed around 95% of all DB CPU time and had mean execution time around 1.85s

The problematic workload came from onboarding-task recording on high-volume signal paths, especially events with sourcemaps. Once a hot onboarding task row already exists and is complete, repeated cache misses still drive workers into update_or_create() instead of taking a non-locking no-op path.

There are mitigations added since 25.5.1 (large cache TTL and completed-onboarding bypass), but a cache miss can still fall through to update_or_create() that performs a locking read/update of the target row. Also, skip option defaults to False and only helps once organization option onboarding:complete exists.

Expected Result

When the onboarding task row already exists in terminal state, record() should be able to return without issuing a SELECT ... FOR UPDATE or updating the row.

For an already-complete task, cache miss should be a cheap non-locking read followed by cache refill, not a locking write path.

Actual Result

Heavy DB load and lock contention on sentry_organizationonboardingtask.

Product Area

Other

Link

No response

DSN

No response

Version

25.5.1

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    Status
    No status
    Status
    Waiting for: Community

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions