feat(about_app): catalog entry for GitHub repo memory source (#3047 follow-up)#3114
Conversation
…ansai#3047 follow-up) tinyhumansai#3047 reworked the GitHub memory source — repo-grouped browsable raw archive (`raw/github-com-<owner>-<repo>/{commits,issues,prs}/`), per-type sync limits (default 2000, overridable via max_commits/max_issues/max_prs), contributor `@handle` entities, and priority boosting for commits + closed/merged issues & PRs — but its body explicitly deferred the `about_app` capability-catalog entry to a follow-up. This is that entry. Per the repo rule (CLAUDE.md "Capability catalog: when a change adds/removes/renames a user-facing feature, update src/openhuman/about_app/"), a user-facing capability the agent should be able to describe ("can you read my GitHub repo?") needs a catalog row so settings search, the /about dump, and the Privacy surface can find it. Added: - `intelligence.github_repo_memory_source` capability. Domain `memory_sources` (the Rust domain) under the Intelligence UI umbrella — the same domain/category split the embeddings entries use. Description spells out that it ingests project *activity* (commits/issues/PRs), not source code, plus the archive layout, entity/priority enrichment, and the 2000-per-type default limit. `how_to` points at Settings > Memory & Data > Memory Sources and the `openhuman.memory_sources_add` RPC. - `GITHUB_REPO_SOURCE` privacy constant. The reader queries the GitHub API directly (`gh` CLI / public REST), not the managed backend, so it reports `leaves_device = true` with `data_kind: Metadata` and a GitHub destination — mirroring the existing `GITHUB_RELEASES_METADATA` shape. Fetched content is archived locally; only its embeddings travel onward (already covered by the embedding-provider capability). Tests (catalog_tests.rs): - Registered the id in `catalog_includes_additional_user_facing_surfaces`. - `github_repo_memory_source_is_registered_with_expected_shape` — pins domain/category/status, the Settings + RPC `how_to`, and the "activity, not source code" framing. - `github_repo_memory_source_reports_github_destination` — pins `leaves_device = true` + a GitHub destination and asserts it is NOT mis-attributed to the OpenHuman backend (the under-reporting failure mode tinyhumansai#2656's review flagged for the embeddings probe). 26/26 about_app lib tests pass; `cargo fmt --check` clean; integration targets compile. No coverage-matrix feature IDs affected (catalog ids are separate from `scripts/feature-ids.json`'s numeric E2E ids). Refs tinyhumansai#3047.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new GitHub repository memory source to the capability catalog: defines its privacy model (third-party GitHub API egress, metadata-class), registers the Beta capability under ChangesGitHub Repo Memory Source Catalog Entry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Heads-up for reviewers: the 4 red checks here are pre-existing main breakage, not from this PR. This PR is Rust-only (
CodeRabbit reviewed and approved with no actionable comments. Happy to rebase once the coverage gate is fixed on main (#3061). |
sanil-23
left a comment
There was a problem hiding this comment.
@justinhsu1477 the code looks good — this is a clean, well-scoped follow-up to #3047. The catalog entry mirrors the existing GITHUB_RELEASES_METADATA privacy shape (third-party host, metadata-class outbound, leaves_device = true), and using the memory_sources Rust domain under the Intelligence UI umbrella is consistent with how the embedding entries split domain vs category. The two new tests are the right ones to pin: shape/how_to breadcrumb + the not-the-backend destination assertion that guards against the under-reporting failure mode. Description clearly framing this as project activity (not source code) is a nice touch.
One thing before I can approve: CI isn't green. The coverage gates are red (Rust Core Coverage, Frontend Coverage, and two E2E lanes). Worth noting the underlying tests pass and Frontend Coverage fails despite this PR touching zero frontend code — so these look like project-wide coverage-threshold/infra failures rather than anything in your diff. Could you re-run the failing jobs / rebase on latest main to confirm? Once CI is green I'll come back and approve. Let me know if you need a hand chasing down the coverage gate.
graycyrus
left a comment
There was a problem hiding this comment.
@justinhsu1477 hey! the code looks good — clean follow-up to #3047.
The GITHUB_REPO_SOURCE privacy constant is the right shape: leaves_device = true, data_kind: Metadata, destination GitHub API (api.github.com) — exactly mirrors GITHUB_RELEASES_METADATA as intended. The domain: "memory_sources" / category: Intelligence split is consistent with how the embeddings entries handle the Rust-domain-vs-UI-umbrella distinction. Description is accurate (activity not source code, the archive layout, the 2000-per-type default), and how_to cites both the live Settings breadcrumb and the programmatic RPC which is exactly what you want to catch a future nav rename before it silently strands users.
The two tests pin the right invariants: shape/breadcrumb and the not-the-backend destination assertion that guards against the under-reporting failure mode #2656 flagged. That's the correct level of coverage for catalog code.
CI has 4 failing checks (E2E lanes 1-2, Frontend Coverage, Rust Core Coverage) but this PR touches zero frontend code and no E2E paths, so these look like project-wide coverage threshold or infra failures rather than anything introduced here. A rebase on main or a re-run of those jobs should confirm. Once CI is green I'll come back and approve.
Summary
#3047 reworked the GitHub memory source — repo-grouped browsable raw archive (
raw/github-com-<owner>-<repo>/{commits,issues,prs}/), per-type sync limits (default 2000, overridable viamax_commits/max_issues/max_prs), contributor@handleentities, and a priority boost for commit messages + closed/merged issues & PRs. Its description explicitly deferred one item:This PR ships that catalog entry. (The per-source-limit Settings UI is a separate UI follow-up — out of scope here.)
Per CLAUDE.md — "Capability catalog: when a change adds/removes/renames a user-facing feature, update
src/openhuman/about_app/" — a user-facing capability the agent should be able to describe ("can you read my GitHub repo?") needs a catalog row so settings search, the/aboutdump, and the Privacy surface can find it.Changes
catalog_data.rsintelligence.github_repo_memory_sourcecapability. Domainmemory_sources(the Rust domain) under the Intelligence UI umbrella — the samedomain/categorysplit the embeddings entries (feat(about_app): catalog entries for embedding provider selection (#2583 follow-up) #2656) use. The description spells out that it ingests project activity (commits/issues/PRs), not source code, plus the archive layout, entity/priority enrichment, and the 2000-per-type default.how_topoints at Settings > Memory & Data > Memory Sources and theopenhuman.memory_sources_addRPC.GITHUB_REPO_SOURCEprivacy constant. The reader queries the GitHub API directly (ghCLI / public REST), not the managed backend, so it reportsleaves_device = true,data_kind: Metadata, destinationGitHub API (api.github.com)— mirroring the existingGITHUB_RELEASES_METADATAshape. Fetched content is archived locally; only its embeddings travel onward (already covered by the embedding-provider capability).catalog_tests.rscatalog_includes_additional_user_facing_surfaces.github_repo_memory_source_is_registered_with_expected_shape— pins domain/category/status, the Settings + RPChow_to, and the "activity, not source code" framing.github_repo_memory_source_reports_github_destination— pinsleaves_device = true+ a GitHub destination, and asserts it is not mis-attributed to the OpenHuman backend (the under-reporting failure mode feat(about_app): catalog entries for embedding provider selection (#2583 follow-up) #2656's review flagged for the embeddings probe).Test plan
cargo test --lib about_app→ 26/26 pass (24 prior + 2 new)cargo check --libcleancargo fmt --checkcleancargo test --tests --no-run— integration targets compileabout_appcapability ids are separate fromscripts/feature-ids.json's numeric E2E ids (matches feat(memory): repo-grouped GitHub raw archive + entities & priority #3047's own "no matrix feature IDs affected" note)Refs #3047.
Summary by CodeRabbit
New Features
Tests