Skip to content

feat(overlay): add generate-metadata workflow for Package entity creation#22

Open
davidfestal wants to merge 1 commit into
redhat-developer:mainfrom
davidfestal:feat/generate-metadata-workflow
Open

feat(overlay): add generate-metadata workflow for Package entity creation#22
davidfestal wants to merge 1 commit into
redhat-developer:mainfrom
davidfestal:feat/generate-metadata-workflow

Conversation

@davidfestal
Copy link
Copy Markdown
Member

Summary

  • Add workflows/generate-metadata.md — a 6-phase workflow that generates missing Package metadata YAML files and audits existing metadata for consistency within overlay workspaces
  • Add scripts/derive-metadata.py — a Python helper script for deterministic metadata derivation (name shortening, OCI URL, supportedVersions, plugins-list parsing, env var extraction)
  • Add <path_resolution> and <shell_permissions> directives to SKILL.md for reliable script invocation and sandbox-safe GitHub API calls across all workflows
  • Rewrite references/metadata-format.md with real-world examples, correct catalog-entities/extensions/ paths, and comprehensive field documentation
  • Delegate onboard-plugin Phase 4 to the new generate-metadata workflow
  • Add tests/unit/test_derive_metadata.py with 47 tests covering all public functions

Workflow Phases

  1. Workspace Identification — resolve target workspace
  2. Scan for Missing Metadata — detect plugins without metadata files, flag consistency issues
  3. Fetch Upstream Source — retrieve package.json + config.d.ts via gh api (agent-direct, no subprocess)
  4. Analyze Config & Wiring — generate appConfigExamples from config schemas, delegate frontend wiring
  5. Plugin Entity Resolution — determine partOf references or Plugin entity creation
  6. Write Files & Report — assemble YAML, update smoke-tests/test.env, audit existing metadata, propose commit/PR

Test plan

  • 289 tests pass (47 new + 242 existing)
  • Run generate-metadata workflow against a real overlay workspace (e.g., analytics) to validate end-to-end
  • Verify onboard-plugin workflow still works with Phase 4 delegation

Made with Cursor

…tion

Add a 6-phase workflow that generates missing Package metadata files and
audits existing ones for consistency. Key capabilities:

- Scans workspaces for plugins missing metadata files
- Derives deterministic fields (name, OCI URL, supportedVersions) from
  source.json, plugins-list.yaml, and upstream package.json
- Fetches config.d.ts from upstream to generate appConfigExamples
- Audits supportedVersions consistency and empty appConfigExamples
- Updates smoke-tests/test.env with placeholder variables
- Delegates from onboard-plugin Phase 4

Also adds path_resolution and shell_permissions directives to SKILL.md
for reliable script invocation across all workflows, and rewrites the
metadata-format reference with real examples and correct paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the overlay skill with a new workflow and helper script to generate/audit Backstage Package metadata for overlay workspaces, and updates related documentation and onboarding guidance.

Changes:

  • Add workflows/generate-metadata.md to define a phased process for scanning, generating, and auditing metadata.
  • Add scripts/derive-metadata.py plus new unit tests to support deterministic metadata derivation and audit checks.
  • Update overlay skill docs (SKILL.md, references/metadata-format.md, and onboard-plugin.md) to route Phase 4 to the new workflow and document the metadata format.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/unit/test_derive_metadata.py Adds unit tests for the new derive-metadata helper script.
skills/overlay/workflows/onboard-plugin.md Delegates Phase 4 metadata work to the new generate-metadata workflow.
skills/overlay/workflows/generate-metadata.md New workflow describing scan/derive/audit steps and expected outputs.
skills/overlay/SKILL.md Adds path resolution + shell permission guidance; adds routing entry for metadata workflow.
skills/overlay/scripts/derive-metadata.py New CLI script to scan workspaces, derive fields, and perform audits.
skills/overlay/references/metadata-format.md Rewrites metadata format reference with updated paths and examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


import importlib.util
import json
import textwrap
Comment on lines +10 to +12
python scripts/derive-metadata.py --workspace argocd
python scripts/derive-metadata.py --workspace argocd --package-json '{"name":"@backstage-community/plugin-argocd","version":"2.8.0","backstage":{"role":"frontend-plugin"}}'
python scripts/derive-metadata.py --extract-env-vars metadata-file.yaml
Comment on lines +45 to +52
def shorten_name(name: str) -> str:
"""Apply shortening rules only if name exceeds K8S_NAME_LIMIT."""
if len(name) <= K8S_NAME_LIMIT:
return name
shortened = name
for old, new in SHORTEN_RULES:
shortened = shortened.replace(old, new)
return shortened
Comment on lines +121 to +140
def find_missing_metadata(workspace_dir: Path, plugins: list[dict]) -> list[dict]:
"""Identify plugins that lack metadata files.

Uses a heuristic: for each plugin path, check if any existing metadata file's
packageName corresponds to that path. Falls back to filename pattern matching.
"""
metadata_dir = workspace_dir / "metadata"
existing_files = list(metadata_dir.glob("*.yaml")) if metadata_dir.exists() else []
existing_names = {f.stem for f in existing_files}

missing = []
for plugin in plugins:
path = plugin["path"]
path_suffix = path.rstrip("/").split("/")[-1] if path != "." else ""
found = any(path_suffix and path_suffix in name for name in existing_names)
if not found and path != ".":
missing.append(plugin)
elif path == "." and not existing_names:
missing.append(plugin)
return missing
Comment on lines +536 to +542
if args.package_json:
pkg = json.loads(args.package_json)
fields = derive_plugin_fields(
pkg, args.workspace, args.plugin_path, source,
supported_versions, existing,
)
print(json.dumps(fields, indent=2 if is_tty else None))
Comment thread skills/overlay/SKILL.md
</path_resolution>

<shell_permissions>
Prefer running `gh api` and `gh search code` as **direct shell commands** rather than via Python subprocess. Direct `gh` calls go through the user's command allowlist without triggering permission prompts. Python scripts that only do local work (file I/O, JSON processing, field derivation) also need no extra permissions. Only request `full_network` for Python scripts that internally spawn `gh` as a subprocess — the sandbox blocks network access from child processes.
Comment on lines +38 to +41
Run the scan command from the overlay repo root (no network needed):

```bash
python3 scripts/derive-metadata.py scan --workspace <workspace>
version: 10.17.0
backstage:
role: frontend-plugin
supportedVersions: 1.45.3
version: 1.4.0
backstage:
role: backend-plugin
supportedVersions: 1.48.3
Comment on lines +431 to +437
is_tty = os.isatty(sys.stdout.fileno())

if args.command == "extract-env-vars":
content = Path(args.file).read_text()
env_vars = extract_env_vars(content)
output = {"env_vars": env_vars}
print(json.dumps(output, indent=2 if is_tty else None))
Copy link
Copy Markdown
Member

@durandom durandom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Script Architecture & Quality

Nice work on the overall concept — extracting deterministic field derivation (name shortening, OCI URLs, version consistency) into a script is exactly the right pattern for this project. The workflow phases are logical and the delegation from onboard-plugin Phase 4 makes sense. The metadata-format.md rewrite with real-world examples is a big improvement.

However, the script has some architectural issues and quality gaps compared to the existing scripts (analyze-pr.py, triage-prs.py). See inline comments for specifics.

Architectural

1. fetch-and-derive mixes concerns — remove or split it

The existing scripts in this project call gh to fetch data, but keep fetching and transformation cleanly separated. fetch-and-derive does both: it shells out to gh api via subprocess AND runs the derive logic. This creates the <shell_permissions> contradiction (see inline comment) and makes the subcommand harder to test (requires mocking subprocess).

The workflow already describes the gh api calls as direct shell commands in Phase 3.1. The script should only do local/deterministic work: scan (read local files) and derive (pure computation). Let the agent or the workflow handle the gh api calls — that's what the existing scripts' patterns do.

2. extract-env-vars doesn't need to be a subcommand

This is a one-liner: grep -oP '\$\{[A-Z_][A-Z0-9_]*\}' file | sort -u. Adding it as a Python subcommand adds code to maintain with no benefit.

Functional Bugs

3. shorten_name has no safety net (see inline)

4. find_missing_metadata uses substring matching (see inline)

Quality — Match Existing Patterns

5. run_gh is inconsistent with existing scripts

Both analyze-pr.py and triage-prs.py have run_gh that:

  • Returns parsed JSON (not raw strings)
  • Catches FileNotFoundError for missing gh CLI
  • Catches json.JSONDecodeError

This script's run_gh does none of those. If you keep run_gh, follow the established pattern.

6. Use --json flag instead of TTY auto-detection (see inline)

Minor

7. Module docstring shows old usage — The usage examples don't show subcommands (scan, derive, etc.) which is confusing.

8. Numbering gap in intake menu — Options jump from 4 to 8 (see inline).

9. supportedVersions mismatch in reference examples — Both examples in metadata-format.md show inconsistent versions (see inline).


TL;DR: The derive logic (name shortening, OCI URLs, consistency checks) is genuinely valuable as a script. The main asks: (1) remove fetch-and-derive — let the workflow handle gh api calls, keep the script doing only local/deterministic work, (2) fix the shorten_name safety net and find_missing_metadata matching, (3) align run_gh and output format with existing scripts.

def shorten_name(name: str) -> str:
"""Apply shortening rules only if name exceeds K8S_NAME_LIMIT."""
if len(name) <= K8S_NAME_LIMIT:
return name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the shortening rules don't reduce the name below 63 chars, this returns a name that exceeds the Kubernetes limit. The upstream shorten-component-name.sh likely has a final truncation step.

Add a safety net — e.g., truncate to 55 chars + - + 7-char hash suffix:

if len(shortened) > K8S_NAME_LIMIT:
    import hashlib
    h = hashlib.sha256(name.encode()).hexdigest()[:7]
    shortened = shortened[:K8S_NAME_LIMIT - 8] + '-' + h
return shortened

The test test_long_name_shortened only passes because the specific test string happens to get short enough — it doesn't cover names where the rules aren't sufficient.

path = plugin["path"]
path_suffix = path.rstrip("/").split("/")[-1] if path != "." else ""
found = any(path_suffix and path_suffix in name for name in existing_names)
if not found and path != ".":
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substring matching: "argocd" in "something-argocd-backend" is True. If the workspace has both plugins/argocd and plugins/argocd-backend, and a metadata file x-argocd-backend.yaml exists, the check for plugins/argocd falsely matches.

Either:

  • Use exact suffix matching with a separator: name.endswith(path_suffix) or name == expected_metadata_name
  • Or better: derive the expected metadata name via package_name_to_metadata_name and check for exact file stem match

The Copilot comment (#4) flagged the same issue.



def run_gh(args, check=True):
"""Run a gh CLI command and return stdout as string."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This run_gh doesn't match the pattern from analyze-pr.py and triage-prs.py:

  • Missing FileNotFoundError handler (crash if gh isn't installed)
  • Returns raw stdout string instead of parsed JSON
  • No json.JSONDecodeError handling

If you keep run_gh (see my comment about removing fetch-and-derive), align with the existing pattern.



def fetch_and_derive_all(
workspace_dir: Path, workspace: str, source: dict, missing_paths: list[str]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subcommand mixes network I/O (gh api via subprocess) with pure computation (field derivation). The workflow already describes the gh api calls as shell commands in Phase 3.1 — having the script also do them via subprocess is redundant.

This is also what creates the <shell_permissions> contradiction in SKILL.md: the directive says "prefer direct gh calls over Python subprocess" but this function does exactly the opposite.

Recommendation: remove fetch-and-derive. The workflow should call gh api directly (agent-friendly, no sandbox issues), then pipe the results to derive for each plugin. The scan + derive subcommands are sufficient.

sys.exit(1)

is_tty = os.isatty(sys.stdout.fileno())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.isatty(sys.stdout.fileno()) raises io.UnsupportedOperation in environments without a real fd (some test runners, CI wrappers). Use sys.stdout.isatty() instead.

Also: the existing scripts use an explicit --json flag rather than TTY auto-detection. That's more predictable — consider matching that pattern.

title: Bugs
- title: Source Code
url: https://github.com/backstage/community-plugins/tree/main/workspaces/dynatrace/plugins/dynatrace
annotations:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supportedVersions: 1.45.3 but the dynamicArtifact tag above uses bs_1.49.4. These should match — this is the reference doc that agents will follow when generating metadata.

Same issue with the backend example below (supportedVersions: 1.48.3 vs bs_1.49.4 in the tag).

Comment thread skills/overlay/SKILL.md

<shell_permissions>
Prefer running `gh api` and `gh search code` as **direct shell commands** rather than via Python subprocess. Direct `gh` calls go through the user's command allowlist without triggering permission prompts. Python scripts that only do local work (file I/O, JSON processing, field derivation) also need no extra permissions. Only request `full_network` for Python scripts that internally spawn `gh` as a subprocess — the sandbox blocks network access from child processes.
</shell_permissions>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directive says "prefer direct gh shell commands over Python subprocess" but derive-metadata.py's fetch-and-derive subcommand does exactly the opposite — it calls gh api via subprocess.run.

Either:

  • Remove fetch-and-derive from the script (recommended — see my comment on the script), or
  • Rewrite this to explain that the script handles gh calls internally and agents should use the script's subcommands rather than calling gh directly for metadata tasks

Comment thread skills/overlay/SKILL.md
2. **Update plugin version** — Bump to newer upstream commit/tag
3. **Check plugin status** — Verify health and compatibility
4. **Fix build failure** — Debug CI/publish issues
8. **Generate or audit metadata** — Add missing Package metadata or fix inconsistencies in existing metadata
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: jumps from option 4 to option 8 in the user-facing menu. Should be 5 (or renumber the Core Team section to leave room).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants