feat(overlay): add generate-metadata workflow for Package entity creation by davidfestal · Pull Request #22 · redhat-developer/rhdh-skill

davidfestal · 2026-05-07T22:38:48Z

Summary

Add workflows/generate-metadata.md — a 6-phase workflow that generates missing Package metadata YAML files and audits existing metadata for consistency within overlay workspaces
Add scripts/derive-metadata.py — a Python helper script for deterministic metadata derivation (name shortening, OCI URL, supportedVersions, plugins-list parsing, env var extraction)
Add <path_resolution> and <shell_permissions> directives to SKILL.md for reliable script invocation and sandbox-safe GitHub API calls across all workflows
Rewrite references/metadata-format.md with real-world examples, correct catalog-entities/extensions/ paths, and comprehensive field documentation
Delegate onboard-plugin Phase 4 to the new generate-metadata workflow
Add tests/unit/test_derive_metadata.py with 47 tests covering all public functions

Workflow Phases

Workspace Identification — resolve target workspace
Scan for Missing Metadata — detect plugins without metadata files, flag consistency issues
Fetch Upstream Source — retrieve package.json + config.d.ts via gh api (agent-direct, no subprocess)
Analyze Config & Wiring — generate appConfigExamples from config schemas, delegate frontend wiring
Plugin Entity Resolution — determine partOf references or Plugin entity creation
Write Files & Report — assemble YAML, update smoke-tests/test.env, audit existing metadata, propose commit/PR

Test plan

289 tests pass (47 new + 242 existing)
Run generate-metadata workflow against a real overlay workspace (e.g., analytics) to validate end-to-end
Verify onboard-plugin workflow still works with Phase 4 delegation

Made with Cursor

…tion Add a 6-phase workflow that generates missing Package metadata files and audits existing ones for consistency. Key capabilities: - Scans workspaces for plugins missing metadata files - Derives deterministic fields (name, OCI URL, supportedVersions) from source.json, plugins-list.yaml, and upstream package.json - Fetches config.d.ts from upstream to generate appConfigExamples - Audits supportedVersions consistency and empty appConfigExamples - Updates smoke-tests/test.env with placeholder variables - Delegates from onboard-plugin Phase 4 Also adds path_resolution and shell_permissions directives to SKILL.md for reliable script invocation across all workflows, and rewrites the metadata-format reference with real examples and correct paths. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR extends the overlay skill with a new workflow and helper script to generate/audit Backstage Package metadata for overlay workspaces, and updates related documentation and onboarding guidance.

Changes:

Add workflows/generate-metadata.md to define a phased process for scanning, generating, and auditing metadata.
Add scripts/derive-metadata.py plus new unit tests to support deterministic metadata derivation and audit checks.
Update overlay skill docs (SKILL.md, references/metadata-format.md, and onboard-plugin.md) to route Phase 4 to the new workflow and document the metadata format.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tests/unit/test_derive_metadata.py	Adds unit tests for the new derive-metadata helper script.
skills/overlay/workflows/onboard-plugin.md	Delegates Phase 4 metadata work to the new generate-metadata workflow.
skills/overlay/workflows/generate-metadata.md	New workflow describing scan/derive/audit steps and expected outputs.
skills/overlay/SKILL.md	Adds path resolution + shell permission guidance; adds routing entry for metadata workflow.
skills/overlay/scripts/derive-metadata.py	New CLI script to scan workspaces, derive fields, and perform audits.
skills/overlay/references/metadata-format.md	Rewrites metadata format reference with updated paths and examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
+import importlib.util
+import json
+import textwrap


+    python scripts/derive-metadata.py --workspace argocd
+    python scripts/derive-metadata.py --workspace argocd --package-json '{"name":"@backstage-community/plugin-argocd","version":"2.8.0","backstage":{"role":"frontend-plugin"}}'
+    python scripts/derive-metadata.py --extract-env-vars metadata-file.yaml


+def shorten_name(name: str) -> str:
+    """Apply shortening rules only if name exceeds K8S_NAME_LIMIT."""
+    if len(name) <= K8S_NAME_LIMIT:
+        return name
+    shortened = name
+    for old, new in SHORTEN_RULES:
+        shortened = shortened.replace(old, new)
+    return shortened


+def find_missing_metadata(workspace_dir: Path, plugins: list[dict]) -> list[dict]:
+    """Identify plugins that lack metadata files.
+
+    Uses a heuristic: for each plugin path, check if any existing metadata file's
+    packageName corresponds to that path. Falls back to filename pattern matching.
+    """
+    metadata_dir = workspace_dir / "metadata"
+    existing_files = list(metadata_dir.glob("*.yaml")) if metadata_dir.exists() else []
+    existing_names = {f.stem for f in existing_files}
+
+    missing = []
+    for plugin in plugins:
+        path = plugin["path"]
+        path_suffix = path.rstrip("/").split("/")[-1] if path != "." else ""
+        found = any(path_suffix and path_suffix in name for name in existing_names)
+        if not found and path != ".":
+            missing.append(plugin)
+        elif path == "." and not existing_names:
+            missing.append(plugin)
+    return missing


+        if args.package_json:
+            pkg = json.loads(args.package_json)
+            fields = derive_plugin_fields(
+                pkg, args.workspace, args.plugin_path, source,
+                supported_versions, existing,
+            )
+            print(json.dumps(fields, indent=2 if is_tty else None))


+</path_resolution>
+
+<shell_permissions>
+Prefer running `gh api` and `gh search code` as **direct shell commands** rather than via Python subprocess. Direct `gh` calls go through the user's command allowlist without triggering permission prompts. Python scripts that only do local work (file I/O, JSON processing, field derivation) also need no extra permissions. Only request `full_network` for Python scripts that internally spawn `gh` as a subprocess — the sandbox blocks network access from child processes.


+Run the scan command from the overlay repo root (no network needed):
+
+```bash
+python3 scripts/derive-metadata.py scan --workspace <workspace>


+  version: 10.17.0
+  backstage:
+    role: frontend-plugin
+    supportedVersions: 1.45.3


+  version: 1.4.0
+  backstage:
+    role: backend-plugin
+    supportedVersions: 1.48.3


+    is_tty = os.isatty(sys.stdout.fileno())
+
+    if args.command == "extract-env-vars":
+        content = Path(args.file).read_text()
+        env_vars = extract_env_vars(content)
+        output = {"env_vars": env_vars}
+        print(json.dumps(output, indent=2 if is_tty else None))


durandom

Review: Script Architecture & Quality

Nice work on the overall concept — extracting deterministic field derivation (name shortening, OCI URLs, version consistency) into a script is exactly the right pattern for this project. The workflow phases are logical and the delegation from onboard-plugin Phase 4 makes sense. The metadata-format.md rewrite with real-world examples is a big improvement.

However, the script has some architectural issues and quality gaps compared to the existing scripts (analyze-pr.py, triage-prs.py). See inline comments for specifics.

Architectural

1. fetch-and-derive mixes concerns — remove or split it

The existing scripts in this project call gh to fetch data, but keep fetching and transformation cleanly separated. fetch-and-derive does both: it shells out to gh api via subprocess AND runs the derive logic. This creates the <shell_permissions> contradiction (see inline comment) and makes the subcommand harder to test (requires mocking subprocess).

The workflow already describes the gh api calls as direct shell commands in Phase 3.1. The script should only do local/deterministic work: scan (read local files) and derive (pure computation). Let the agent or the workflow handle the gh api calls — that's what the existing scripts' patterns do.

2. extract-env-vars doesn't need to be a subcommand

This is a one-liner: grep -oP '\$\{[A-Z_][A-Z0-9_]*\}' file | sort -u. Adding it as a Python subcommand adds code to maintain with no benefit.

Functional Bugs

3. shorten_name has no safety net (see inline)

4. find_missing_metadata uses substring matching (see inline)

Quality — Match Existing Patterns

5. run_gh is inconsistent with existing scripts

Both analyze-pr.py and triage-prs.py have run_gh that:

Returns parsed JSON (not raw strings)
Catches FileNotFoundError for missing gh CLI
Catches json.JSONDecodeError

This script's run_gh does none of those. If you keep run_gh, follow the established pattern.

6. Use --json flag instead of TTY auto-detection (see inline)

Minor

7. Module docstring shows old usage — The usage examples don't show subcommands (scan, derive, etc.) which is confusing.

8. Numbering gap in intake menu — Options jump from 4 to 8 (see inline).

9. supportedVersions mismatch in reference examples — Both examples in metadata-format.md show inconsistent versions (see inline).

TL;DR: The derive logic (name shortening, OCI URLs, consistency checks) is genuinely valuable as a script. The main asks: (1) remove fetch-and-derive — let the workflow handle gh api calls, keep the script doing only local/deterministic work, (2) fix the shorten_name safety net and find_missing_metadata matching, (3) align run_gh and output format with existing scripts.

durandom · 2026-05-13T11:50:57Z

+def shorten_name(name: str) -> str:
+    """Apply shortening rules only if name exceeds K8S_NAME_LIMIT."""
+    if len(name) <= K8S_NAME_LIMIT:
+        return name


If the shortening rules don't reduce the name below 63 chars, this returns a name that exceeds the Kubernetes limit. The upstream shorten-component-name.sh likely has a final truncation step.

Add a safety net — e.g., truncate to 55 chars + - + 7-char hash suffix:

if len(shortened) > K8S_NAME_LIMIT: import hashlib h = hashlib.sha256(name.encode()).hexdigest()[:7] shortened = shortened[:K8S_NAME_LIMIT - 8] + '-' + h return shortened

The test test_long_name_shortened only passes because the specific test string happens to get short enough — it doesn't cover names where the rules aren't sufficient.

durandom · 2026-05-13T11:50:57Z

+        path = plugin["path"]
+        path_suffix = path.rstrip("/").split("/")[-1] if path != "." else ""
+        found = any(path_suffix and path_suffix in name for name in existing_names)
+        if not found and path != ".":


Substring matching: "argocd" in "something-argocd-backend" is True. If the workspace has both plugins/argocd and plugins/argocd-backend, and a metadata file x-argocd-backend.yaml exists, the check for plugins/argocd falsely matches.

Either:

Use exact suffix matching with a separator: name.endswith(path_suffix) or name == expected_metadata_name

Or better: derive the expected metadata name via package_name_to_metadata_name and check for exact file stem match

The Copilot comment (#4) flagged the same issue.

durandom · 2026-05-13T11:50:57Z

+
+
+def run_gh(args, check=True):
+    """Run a gh CLI command and return stdout as string."""


This run_gh doesn't match the pattern from analyze-pr.py and triage-prs.py:

Missing FileNotFoundError handler (crash if gh isn't installed)

Returns raw stdout string instead of parsed JSON

No json.JSONDecodeError handling

If you keep run_gh (see my comment about removing fetch-and-derive), align with the existing pattern.

durandom · 2026-05-13T11:50:57Z

+
+
+def fetch_and_derive_all(
+    workspace_dir: Path, workspace: str, source: dict, missing_paths: list[str]


This subcommand mixes network I/O (gh api via subprocess) with pure computation (field derivation). The workflow already describes the gh api calls as shell commands in Phase 3.1 — having the script also do them via subprocess is redundant.

This is also what creates the <shell_permissions> contradiction in SKILL.md: the directive says "prefer direct gh calls over Python subprocess" but this function does exactly the opposite.

Recommendation: remove fetch-and-derive. The workflow should call gh api directly (agent-friendly, no sandbox issues), then pipe the results to derive for each plugin. The scan + derive subcommands are sufficient.

durandom · 2026-05-13T11:50:57Z

+        sys.exit(1)
+
+    is_tty = os.isatty(sys.stdout.fileno())
+


os.isatty(sys.stdout.fileno()) raises io.UnsupportedOperation in environments without a real fd (some test runners, CI wrappers). Use sys.stdout.isatty() instead.

Also: the existing scripts use an explicit --json flag rather than TTY auto-detection. That's more predictable — consider matching that pattern.

durandom · 2026-05-13T11:50:57Z

+      title: Bugs
+    - title: Source Code
+      url: https://github.com/backstage/community-plugins/tree/main/workspaces/dynatrace/plugins/dynatrace
+  annotations:


supportedVersions: 1.45.3 but the dynamicArtifact tag above uses bs_1.49.4. These should match — this is the reference doc that agents will follow when generating metadata.

Same issue with the backend example below (supportedVersions: 1.48.3 vs bs_1.49.4 in the tag).

durandom · 2026-05-13T11:50:58Z

+
+<shell_permissions>
+Prefer running `gh api` and `gh search code` as **direct shell commands** rather than via Python subprocess. Direct `gh` calls go through the user's command allowlist without triggering permission prompts. Python scripts that only do local work (file I/O, JSON processing, field derivation) also need no extra permissions. Only request `full_network` for Python scripts that internally spawn `gh` as a subprocess — the sandbox blocks network access from child processes.
+</shell_permissions>


This directive says "prefer direct gh shell commands over Python subprocess" but derive-metadata.py's fetch-and-derive subcommand does exactly the opposite — it calls gh api via subprocess.run.

Either:

Remove fetch-and-derive from the script (recommended — see my comment on the script), or

Rewrite this to explain that the script handles gh calls internally and agents should use the script's subcommands rather than calling gh directly for metadata tasks

durandom · 2026-05-13T11:50:58Z

 2. **Update plugin version** — Bump to newer upstream commit/tag
 3. **Check plugin status** — Verify health and compatibility
 4. **Fix build failure** — Debug CI/publish issues
+8. **Generate or audit metadata** — Add missing Package metadata or fix inconsistencies in existing metadata


Nit: jumps from option 4 to option 8 in the user-facing menu. Should be 5 (or renumber the Core Team section to leave room).

durandom requested a review from Copilot May 8, 2026 10:58

Copilot started reviewing on behalf of durandom May 8, 2026 10:59 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

durandom requested changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(overlay): add generate-metadata workflow for Package entity creation#22

feat(overlay): add generate-metadata workflow for Package entity creation#22
davidfestal wants to merge 1 commit into
redhat-developer:mainfrom
davidfestal:feat/generate-metadata-workflow

davidfestal commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

durandom left a comment

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

durandom May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def run_gh(args, check=True):
		"""Run a gh CLI command and return stdout as string."""



		def fetch_and_derive_all(
		workspace_dir: Path, workspace: str, source: dict, missing_paths: list[str]

Conversation

davidfestal commented May 7, 2026

Summary

Workflow Phases

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

durandom left a comment

Choose a reason for hiding this comment

Review: Script Architecture & Quality

Architectural

Functional Bugs

Quality — Match Existing Patterns

Minor

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants