feat(processing_errors): roll out TaskProducer to EAP producer#118585
feat(processing_errors): roll out TaskProducer to EAP producer#118585bmckerry wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 19d4c68. Configure here.
| _eap_producer.produce(ArroyoTopic(topic), payload) | ||
| if settings.TASKWORKER_USE_TASK_PRODUCER and in_random_rollout( | ||
| "tasks.producer.processing_errors.rollout" | ||
| ): |
There was a problem hiding this comment.
Rollout option name mismatch
High Severity
in_random_rollout is called with tasks.producer.processing_errors.rollout, but the option registered in defaults is tasks.producer.processing-errors.rollout. With TASKWORKER_USE_TASK_PRODUCER enabled, options.get raises UnknownOption, the produce path hits the broad exception handler, and processing errors are not sent to EAP while the configured rollout value is ignored.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 19d4c68. Configure here.
| if settings.TASKWORKER_USE_TASK_PRODUCER and in_random_rollout( | ||
| "tasks.producer.processing_errors.rollout" | ||
| ): |
There was a problem hiding this comment.
Bug: The option tasks.producer.processing-errors.rollout is registered with a hyphen but looked up with an underscore, causing the feature to silently fail when enabled.
Severity: HIGH
Suggested Fix
Update the option lookup in producer.py to use a hyphen instead of an underscore to match its registration. Change in_random_rollout("tasks.producer.processing_errors.rollout") to in_random_rollout("tasks.producer.processing-errors.rollout").
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/processing_errors/eap/producer.py#L137-L139
Potential issue: The option for EAP processing error rollout is registered as
`tasks.producer.processing-errors.rollout` (with a hyphen) in
`sentry/options/defaults.py`, but it is looked up as
`tasks.producer.processing_errors.rollout` (with an underscore) in
`sentry/processing_errors/eap/producer.py`. The `options.get` call within
`in_random_rollout` will raise an `UnknownOption` exception because the key with the
underscore does not exist and there is no normalization between hyphens and underscores.
This exception is caught by a broad `except Exception` block, causing the feature to
fail silently while logging an error. This will occur whenever
`settings.TASKWORKER_USE_TASK_PRODUCER` is enabled, effectively disabling the EAP
processing error production feature.
Also affects:
src/sentry/options/defaults.py:3780~3783
Did we get this right? 👍 / 👎 to inform future reviews.
| payload = KafkaPayload(None, EAP_ITEMS_CODEC.encode(trace_item), []) | ||
| _eap_producer.produce(ArroyoTopic(topic), payload) | ||
| if settings.TASKWORKER_USE_TASK_PRODUCER and in_random_rollout( | ||
| "tasks.producer.processing_errors.rollout" |
There was a problem hiding this comment.
| "tasks.producer.processing_errors.rollout" | |
| "tasks.producer.processing-errors.rollout" |


Followup to #118584
This PR updates the processing errors eap_producer to use TaskProducer in tasks.
The reason for this change is that SingletonProducer doesn't guarantee at-least-once delivery when used inside a task. TaskProducer keeps track of any producer futures generated as a part of task execution, and only marks a task as successful if all producer futures succeed.
To ensure a safe rollout, workers will instantiate an instance of the old and new producers, and cut over to the new producers based on a rollout option. This means that task futures won't be properly tracked while the rollout option is <1.0, but since our base state is to just ignore producer futures that's acceptable.