Skip to content

Enrich error/success status spans with failure context#253

Open
morgan-wowk wants to merge 1 commit into
masterfrom
execution-tracing-error-data
Open

Enrich error/success status spans with failure context#253
morgan-wowk wants to merge 1 commit into
masterfrom
execution-tracing-error-data

Conversation

@morgan-wowk
Copy link
Copy Markdown
Collaborator

@morgan-wowk morgan-wowk commented May 22, 2026

Enrich execution trace spans with error details and root span status

FAILED span: Attaches error.message from orchestration_error_message in extra_data and execution.exit_code from the associated ContainerExecution when available.

SYSTEM_ERROR span: Attaches error.message from the system error exception message and exception.stacktrace from the full exception traceback stored in extra_data.

SUCCEEDED span: Attaches execution.exit_code from the associated ContainerExecution when available.

Root span: Calls set_status(ERROR) when the terminal status is FAILED or SYSTEM_ERROR, leaving the root span unmarked for successful executions.

Screenshots

Screenshot 2026-05-22 at 8.14.49 PM.png

@morgan-wowk morgan-wowk force-pushed the execution-tracing-core branch from ebcd9a2 to e9fe6e0 Compare May 23, 2026 00:28
@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from 6581758 to c73509f Compare May 23, 2026 00:31
@morgan-wowk morgan-wowk force-pushed the execution-tracing-core branch from e9fe6e0 to 059c47b Compare May 23, 2026 02:57
@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from c73509f to 057544f Compare May 23, 2026 02:57
@morgan-wowk morgan-wowk marked this pull request as ready for review May 23, 2026 03:22
@morgan-wowk morgan-wowk changed the base branch from execution-tracing-core to graphite-base/253 May 26, 2026 23:16
@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from 057544f to 04630ed Compare May 26, 2026 23:27
@morgan-wowk morgan-wowk requested a review from Ark-kun as a code owner May 26, 2026 23:27
@morgan-wowk morgan-wowk changed the base branch from graphite-base/253 to execution-tracing-core May 26, 2026 23:48
@morgan-wowk morgan-wowk changed the base branch from execution-tracing-core to graphite-base/253 May 27, 2026 00:21
@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from 04630ed to 2ac38c8 Compare May 27, 2026 00:21
@graphite-app graphite-app Bot changed the base branch from graphite-base/253 to master May 27, 2026 00:22
@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from 2ac38c8 to 2025a32 Compare May 27, 2026 00:22
Comment on lines +23 to +29
_ERROR_TERMINAL_STATUSES = frozenset(
s.value
for s in (
bts.ContainerExecutionStatus.FAILED,
bts.ContainerExecutionStatus.SYSTEM_ERROR,
)
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI

_ERROR_TERMINAL_STATUSES = frozenset({
    bts.ContainerExecutionStatus.FAILED,
    bts.ContainerExecutionStatus.SYSTEM_ERROR,
})

Copy link
Copy Markdown
Collaborator Author

@morgan-wowk morgan-wowk May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Adopted — switched to frozenset({...}) with the enum members directly. Simpler since ContainerExecutionStatus is a str enum.

).end(end_time=_ns(dt=t_end))

if history[-1]["status"] in _ERROR_TERMINAL_STATUSES:
root.set_status(status=StatusCode.ERROR)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, if you don't set the Status, assuming it's "successful" by default?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Correct — OTel default status is UNSET which means "no error". We only explicitly set ERROR for failed/system_error executions.

Comment on lines +185 to +186
assert err_span.attributes["error.message"] == "RuntimeError"
assert err_span.attributes["exception.stacktrace"] == "Traceback..."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering, would

  • exception.message
  • exception.stacktrace
    Make more sense since that aligns with the SQL keys, or is "error" not from an "exception"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Good call — renamed to exception.message to align with OTel semantic conventions.

Comment on lines +40 to +43
if execution.container_execution_id is not None:
ce = execution.container_execution
if ce is not None and ce.exit_code is not None:
attrs["execution.exit_code"] = ce.exit_code
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you using walrus operator in the upstream PR and it got me thinking

def _error_attrs(*, execution: bts.ExecutionNode, status: str) -> dict[str, object]:

    """Extra attributes for terminal error/success status spans."""
    extra = execution.extra_data or {}
    attrs: dict[str, object] = {}

    def _set_exit_code() -> None:
        if (ce := execution.container_execution) and ce.exit_code is not None:
            attrs["execution.exit_code"] = ce.exit_code

    if status == bts.ContainerExecutionStatus.FAILED:
        msg = extra.get(bts.EXECUTION_NODE_EXTRA_DATA_ORCHESTRATION_ERROR_MESSAGE_KEY)
        if msg is not None:
            attrs["error.message"] = msg
        _set_exit_code()
    elif status == bts.ContainerExecutionStatus.SYSTEM_ERROR:
        msg = extra.get(
            bts.EXECUTION_NODE_EXTRA_DATA_SYSTEM_ERROR_EXCEPTION_MESSAGE_KEY
        )
        if msg is not None:
            attrs["error.message"] = msg
        tb = extra.get(bts.EXECUTION_NODE_EXTRA_DATA_SYSTEM_ERROR_EXCEPTION_FULL_KEY)
        if tb is not None:
            attrs["exception.stacktrace"] = tb
    elif status == bts.ContainerExecutionStatus.SUCCEEDED:
        _set_exit_code()
        
    return attrs

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Adopted this — cleaner with the walrus operator and _set_exit_code helper.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: This entire stack is written by AI with supervision. To make your suggestions permanent adoptions, please add to the CLAUDE.md.

@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from 2025a32 to d9fcf6b Compare May 29, 2026 18:52
FAILED span: error.message from orchestration_error_message, exit_code
SYSTEM_ERROR span: error.message + exception.stacktrace from extra_data
SUCCEEDED span: exit_code from ContainerExecution
Root span: set_status(ERROR) for FAILED and SYSTEM_ERROR terminals
@morgan-wowk morgan-wowk force-pushed the execution-tracing-error-data branch from d9fcf6b to b9f8147 Compare May 29, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants