Skip to content

feat(processing_errors): roll out TaskProducer to EAP producer#118585

Open
bmckerry wants to merge 1 commit into
ben/uptime-eap-taskproducerfrom
ben/processing-errors-taskproducer
Open

feat(processing_errors): roll out TaskProducer to EAP producer#118585
bmckerry wants to merge 1 commit into
ben/uptime-eap-taskproducerfrom
ben/processing-errors-taskproducer

Conversation

@bmckerry

Copy link
Copy Markdown
Member

Followup to #118584

This PR updates the processing errors eap_producer to use TaskProducer in tasks.

The reason for this change is that SingletonProducer doesn't guarantee at-least-once delivery when used inside a task. TaskProducer keeps track of any producer futures generated as a part of task execution, and only marks a task as successful if all producer futures succeed.

To ensure a safe rollout, workers will instantiate an instance of the old and new producers, and cut over to the new producers based on a rollout option. This means that task futures won't be properly tracked while the rollout option is <1.0, but since our base state is to just ignore producer futures that's acceptable.

@bmckerry bmckerry requested review from a team as code owners June 26, 2026 19:18
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 26, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 19d4c68. Configure here.

_eap_producer.produce(ArroyoTopic(topic), payload)
if settings.TASKWORKER_USE_TASK_PRODUCER and in_random_rollout(
"tasks.producer.processing_errors.rollout"
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollout option name mismatch

High Severity

in_random_rollout is called with tasks.producer.processing_errors.rollout, but the option registered in defaults is tasks.producer.processing-errors.rollout. With TASKWORKER_USE_TASK_PRODUCER enabled, options.get raises UnknownOption, the produce path hits the broad exception handler, and processing errors are not sent to EAP while the configured rollout value is ignored.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 19d4c68. Configure here.

Comment on lines +137 to +139
if settings.TASKWORKER_USE_TASK_PRODUCER and in_random_rollout(
"tasks.producer.processing_errors.rollout"
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The option tasks.producer.processing-errors.rollout is registered with a hyphen but looked up with an underscore, causing the feature to silently fail when enabled.
Severity: HIGH

Suggested Fix

Update the option lookup in producer.py to use a hyphen instead of an underscore to match its registration. Change in_random_rollout("tasks.producer.processing_errors.rollout") to in_random_rollout("tasks.producer.processing-errors.rollout").

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/processing_errors/eap/producer.py#L137-L139

Potential issue: The option for EAP processing error rollout is registered as
`tasks.producer.processing-errors.rollout` (with a hyphen) in
`sentry/options/defaults.py`, but it is looked up as
`tasks.producer.processing_errors.rollout` (with an underscore) in
`sentry/processing_errors/eap/producer.py`. The `options.get` call within
`in_random_rollout` will raise an `UnknownOption` exception because the key with the
underscore does not exist and there is no normalization between hyphens and underscores.
This exception is caught by a broad `except Exception` block, causing the feature to
fail silently while logging an error. This will occur whenever
`settings.TASKWORKER_USE_TASK_PRODUCER` is enabled, effectively disabling the EAP
processing error production feature.

Also affects:

  • src/sentry/options/defaults.py:3780~3783

Did we get this right? 👍 / 👎 to inform future reviews.

payload = KafkaPayload(None, EAP_ITEMS_CODEC.encode(trace_item), [])
_eap_producer.produce(ArroyoTopic(topic), payload)
if settings.TASKWORKER_USE_TASK_PRODUCER and in_random_rollout(
"tasks.producer.processing_errors.rollout"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"tasks.producer.processing_errors.rollout"
"tasks.producer.processing-errors.rollout"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants