Skip to content

Add launcher type and cluster URL to root execution span#255

Open
morgan-wowk wants to merge 1 commit into
execution-tracing-launcher-datafrom
execution-tracing-launcher-type
Open

Add launcher type and cluster URL to root execution span#255
morgan-wowk wants to merge 1 commit into
execution-tracing-launcher-datafrom
execution-tracing-launcher-type

Conversation

@morgan-wowk
Copy link
Copy Markdown
Collaborator

@morgan-wowk morgan-wowk commented May 22, 2026

Add execution.launcher and execution.cloud_provider OTel trace attributes to root execution span

execution.launcher is derived from the top-level key of launcher_data (e.g. kubernetes, kubernetes_job, skypilot) and distinguishes the launcher mechanism used for a given execution.

k8s.cluster.url on the root span allows GKE vs Nebius cluster identification by URL pattern in oasis-backend's multi-cloud setup, populated from cluster_server inside the launcher's data block.

execution.cloud_provider is read from the cloud-pipelines.net/orchestration/cloud_provider task_spec annotation, set at routing time by callers such as MultiLauncherContainerLauncher. This enables traces to be searched by cloud provider (e.g. gke, nebius) without relying on URL pattern matching against cluster hostnames. Launchers with a fixed cloud affinity can also set this annotation directly.

The CLOUD_PROVIDER_ANNOTATION_KEY constant is defined in common_annotations so it can be shared across launchers and the tracing layer without duplication.

Screenshots

Screenshot 2026-05-22 at 8.19.10 PM.png

Copy link
Copy Markdown
Collaborator Author

@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch 2 times, most recently from 57ba606 to 54e0001 Compare May 23, 2026 00:05
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-data branch from 3aa355d to 3e85d85 Compare May 23, 2026 00:31
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch from 54e0001 to 6f8a30a Compare May 23, 2026 00:31
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-data branch from 3e85d85 to 239a70c Compare May 23, 2026 02:57
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch from 6f8a30a to a8e9b03 Compare May 23, 2026 02:57
@morgan-wowk morgan-wowk marked this pull request as ready for review May 23, 2026 03:22
@morgan-wowk morgan-wowk requested a review from Ark-kun as a code owner May 23, 2026 03:22
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-data branch from 239a70c to 6521dc7 Compare May 26, 2026 23:27
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch from a8e9b03 to 5a111d0 Compare May 26, 2026 23:27
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-data branch from 6521dc7 to 374f9cf Compare May 27, 2026 00:22
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch from 5a111d0 to 3baf255 Compare May 27, 2026 00:22
Copy link
Copy Markdown
Collaborator

@yuechao-qin yuechao-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this PR also distinguish between Nebius or not? Can you then filter on

cloud_provider = Nebius && status = <SOMETHING, Maybe Failures>?



class TestLauncherTypeAttrs:
def test_root_span_carries_launcher_type(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not testing for launcher type?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Added — test_root_span_carries_launcher_type_and_cluster_url now sets up a ContainerExecution with kubernetes launcher_data and asserts execution.launcher == "kubernetes" and k8s.cluster.url on the root span.

@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch from 3baf255 to d7605f3 Compare May 29, 2026 18:52
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-data branch from 374f9cf to 22df75e Compare May 29, 2026 18:52
@morgan-wowk
Copy link
Copy Markdown
Collaborator Author

🤖 Exactly — with this PR you can filter traces on execution.cloud_provider = "nebius" && execution.status = "FAILED" (or any status). The cloud_provider comes from the task_spec annotation set at routing time by the MultiLauncherContainerLauncher in oasis-backend.

execution.launcher = top-level key from launcher_data (e.g. 'kubernetes',
'kubernetes_job', 'skypilot') distinguishes launcher mechanism.
k8s.cluster.url on root span allows GKE vs Nebius cluster identification
by URL pattern in oasis-backend's multi-cloud setup.
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-data branch from 22df75e to b7aa164 Compare May 29, 2026 19:03
@morgan-wowk morgan-wowk force-pushed the execution-tracing-launcher-type branch from d7605f3 to 4871348 Compare May 29, 2026 19:04
Comment on lines +296 to +307
def test_root_span_omits_launcher_without_container_execution(
self, span_exporter: InMemorySpanExporter
) -> None:
execution = _make_execution(statuses=["QUEUED", "SUCCEEDED"])
execution_tracing.emit_execution_trace(execution=execution)

root = next(
s for s in span_exporter.get_finished_spans() if s.name == "execution"
)
assert "execution.launcher" not in (root.attributes or {})

def test_root_span_no_launcher_attrs_without_container_execution(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that these K8 attr will only be available when the span has ended (has terminal state)?

What's the difference between

  • test_root_span_omits_launcher_without_container_execution
  • test_root_span_no_launcher_attrs_without_container_execution

Just testing two different terminal states?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants