Skip to content

feat(about_app): catalog entry for GitHub repo memory source (#3047 follow-up)#3114

Open
justinhsu1477 wants to merge 2 commits into
tinyhumansai:mainfrom
justinhsu1477:feat/about-app-github-raw-archive-capability
Open

feat(about_app): catalog entry for GitHub repo memory source (#3047 follow-up)#3114
justinhsu1477 wants to merge 2 commits into
tinyhumansai:mainfrom
justinhsu1477:feat/about-app-github-raw-archive-capability

Conversation

@justinhsu1477
Copy link
Copy Markdown
Contributor

@justinhsu1477 justinhsu1477 commented Jun 1, 2026

Summary

#3047 reworked the GitHub memory source — repo-grouped browsable raw archive (raw/github-com-<owner>-<repo>/{commits,issues,prs}/), per-type sync limits (default 2000, overridable via max_commits/max_issues/max_prs), contributor @handle entities, and a priority boost for commit messages + closed/merged issues & PRs. Its description explicitly deferred one item:

Follow-up PR(s)/TODOs: per-source limit override on the update/patch path + Settings UI; about_app capability-catalog entry.

This PR ships that catalog entry. (The per-source-limit Settings UI is a separate UI follow-up — out of scope here.)

Per CLAUDE.md — "Capability catalog: when a change adds/removes/renames a user-facing feature, update src/openhuman/about_app/" — a user-facing capability the agent should be able to describe ("can you read my GitHub repo?") needs a catalog row so settings search, the /about dump, and the Privacy surface can find it.

Changes

catalog_data.rs

  • New intelligence.github_repo_memory_source capability. Domain memory_sources (the Rust domain) under the Intelligence UI umbrella — the same domain/category split the embeddings entries (feat(about_app): catalog entries for embedding provider selection (#2583 follow-up) #2656) use. The description spells out that it ingests project activity (commits/issues/PRs), not source code, plus the archive layout, entity/priority enrichment, and the 2000-per-type default. how_to points at Settings > Memory & Data > Memory Sources and the openhuman.memory_sources_add RPC.
  • New GITHUB_REPO_SOURCE privacy constant. The reader queries the GitHub API directly (gh CLI / public REST), not the managed backend, so it reports leaves_device = true, data_kind: Metadata, destination GitHub API (api.github.com) — mirroring the existing GITHUB_RELEASES_METADATA shape. Fetched content is archived locally; only its embeddings travel onward (already covered by the embedding-provider capability).

catalog_tests.rs

  • Registered the id in catalog_includes_additional_user_facing_surfaces.
  • github_repo_memory_source_is_registered_with_expected_shape — pins domain/category/status, the Settings + RPC how_to, and the "activity, not source code" framing.
  • github_repo_memory_source_reports_github_destination — pins leaves_device = true + a GitHub destination, and asserts it is not mis-attributed to the OpenHuman backend (the under-reporting failure mode feat(about_app): catalog entries for embedding provider selection (#2583 follow-up) #2656's review flagged for the embeddings probe).

Test plan

  • cargo test --lib about_app26/26 pass (24 prior + 2 new)
  • cargo check --lib clean
  • cargo fmt --check clean
  • cargo test --tests --no-run — integration targets compile
  • No coverage-matrix rows affected — about_app capability ids are separate from scripts/feature-ids.json's numeric E2E ids (matches feat(memory): repo-grouped GitHub raw archive + entities & priority #3047's own "no matrix feature IDs affected" note)

Refs #3047.

Summary by CodeRabbit

  • New Features

    • GitHub repository memory source (Beta) added to Memory Sources settings.
    • Syncs GitHub commits, issues, and pull requests into your memory vault for contextual recall.
    • Privacy transparency: outbound activity goes to the GitHub API; fetched content is archived locally and outbound payloads are marked as metadata.
  • Tests

    • Catalog and privacy behavior validated to ensure correct labeling and settings surface guidance.

…ansai#3047 follow-up)

tinyhumansai#3047 reworked the GitHub memory source — repo-grouped browsable raw
archive (`raw/github-com-<owner>-<repo>/{commits,issues,prs}/`), per-type
sync limits (default 2000, overridable via max_commits/max_issues/max_prs),
contributor `@handle` entities, and priority boosting for commits +
closed/merged issues & PRs — but its body explicitly deferred the
`about_app` capability-catalog entry to a follow-up. This is that entry.

Per the repo rule (CLAUDE.md "Capability catalog: when a change
adds/removes/renames a user-facing feature, update src/openhuman/about_app/"),
a user-facing capability the agent should be able to describe ("can you
read my GitHub repo?") needs a catalog row so settings search, the /about
dump, and the Privacy surface can find it.

Added:

- `intelligence.github_repo_memory_source` capability. Domain
  `memory_sources` (the Rust domain) under the Intelligence UI umbrella —
  the same domain/category split the embeddings entries use. Description
  spells out that it ingests project *activity* (commits/issues/PRs), not
  source code, plus the archive layout, entity/priority enrichment, and
  the 2000-per-type default limit. `how_to` points at
  Settings > Memory & Data > Memory Sources and the
  `openhuman.memory_sources_add` RPC.

- `GITHUB_REPO_SOURCE` privacy constant. The reader queries the GitHub
  API directly (`gh` CLI / public REST), not the managed backend, so it
  reports `leaves_device = true` with `data_kind: Metadata` and a GitHub
  destination — mirroring the existing `GITHUB_RELEASES_METADATA` shape.
  Fetched content is archived locally; only its embeddings travel onward
  (already covered by the embedding-provider capability).

Tests (catalog_tests.rs):

- Registered the id in `catalog_includes_additional_user_facing_surfaces`.
- `github_repo_memory_source_is_registered_with_expected_shape` — pins
  domain/category/status, the Settings + RPC `how_to`, and the
  "activity, not source code" framing.
- `github_repo_memory_source_reports_github_destination` — pins
  `leaves_device = true` + a GitHub destination and asserts it is NOT
  mis-attributed to the OpenHuman backend (the under-reporting failure
  mode tinyhumansai#2656's review flagged for the embeddings probe).

26/26 about_app lib tests pass; `cargo fmt --check` clean; integration
targets compile. No coverage-matrix feature IDs affected (catalog ids are
separate from `scripts/feature-ids.json`'s numeric E2E ids).

Refs tinyhumansai#3047.
@justinhsu1477 justinhsu1477 requested a review from a team June 1, 2026 03:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a7a655ea-5a4c-48e2-bb6c-1211d8eac472

📥 Commits

Reviewing files that changed from the base of the PR and between 6be0424 and 13e220a.

📒 Files selected for processing (1)
  • src/openhuman/about_app/catalog_data.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/about_app/catalog_data.rs

📝 Walkthrough

Walkthrough

Adds a new GitHub repository memory source to the capability catalog: defines its privacy model (third-party GitHub API egress, metadata-class), registers the Beta capability under memory_sources, and adds tests validating catalog metadata and privacy destinations.

Changes

GitHub Repo Memory Source Catalog Entry

Layer / File(s) Summary
Privacy model and capability registration
src/openhuman/about_app/catalog_data.rs
Introduces GITHUB_REPO_SOURCE CapabilityPrivacy constant modeling GitHub repo-activity queries as third-party egress, and adds intelligence.github_repo_memory_source (Beta) under memory_sources wired to this privacy model.
Capability shape and privacy validation tests
src/openhuman/about_app/catalog_tests.rs
Adds the capability to the user-facing ID list and tests its domain/category/status/how_to text, plus privacy tests asserting leaves_device = true and that destinations include “github” while excluding the openhuman backend.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

feature, rust-core, working

Suggested reviewers

  • graycyrus
  • sanil-23
  • oxoxDev

Poem

🐰 I nibble at lines of repo lore,
I fetch commits and note each chore,
A privacy badge, a catalog song,
Tests hum softly—nothing goes wrong. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(about_app): catalog entry for GitHub repo memory source (#3047 follow-up)' clearly and specifically describes the main change: adding a catalog entry for the GitHub repo memory source capability.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team. labels Jun 1, 2026
@justinhsu1477
Copy link
Copy Markdown
Contributor Author

Heads-up for reviewers: the 4 red checks here are pre-existing main breakage, not from this PR.

This PR is Rust-only (about_app catalog data + tests) — it can't affect frontend lanes, and the lib suite passes clean in the coverage run (test result: ok. 10711 passed; 0 failed).

CodeRabbit reviewed and approved with no actionable comments. Happy to rebase once the coverage gate is fixed on main (#3061).

Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinhsu1477 the code looks good — this is a clean, well-scoped follow-up to #3047. The catalog entry mirrors the existing GITHUB_RELEASES_METADATA privacy shape (third-party host, metadata-class outbound, leaves_device = true), and using the memory_sources Rust domain under the Intelligence UI umbrella is consistent with how the embedding entries split domain vs category. The two new tests are the right ones to pin: shape/how_to breadcrumb + the not-the-backend destination assertion that guards against the under-reporting failure mode. Description clearly framing this as project activity (not source code) is a nice touch.

One thing before I can approve: CI isn't green. The coverage gates are red (Rust Core Coverage, Frontend Coverage, and two E2E lanes). Worth noting the underlying tests pass and Frontend Coverage fails despite this PR touching zero frontend code — so these look like project-wide coverage-threshold/infra failures rather than anything in your diff. Could you re-run the failing jobs / rebase on latest main to confirm? Once CI is green I'll come back and approve. Let me know if you need a hand chasing down the coverage gate.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinhsu1477 hey! the code looks good — clean follow-up to #3047.

The GITHUB_REPO_SOURCE privacy constant is the right shape: leaves_device = true, data_kind: Metadata, destination GitHub API (api.github.com) — exactly mirrors GITHUB_RELEASES_METADATA as intended. The domain: "memory_sources" / category: Intelligence split is consistent with how the embeddings entries handle the Rust-domain-vs-UI-umbrella distinction. Description is accurate (activity not source code, the archive layout, the 2000-per-type default), and how_to cites both the live Settings breadcrumb and the programmatic RPC which is exactly what you want to catch a future nav rename before it silently strands users.

The two tests pin the right invariants: shape/breadcrumb and the not-the-backend destination assertion that guards against the under-reporting failure mode #2656 flagged. That's the correct level of coverage for catalog code.

CI has 4 failing checks (E2E lanes 1-2, Frontend Coverage, Rust Core Coverage) but this PR touches zero frontend code and no E2E paths, so these look like project-wide coverage threshold or infra failures rather than anything introduced here. A rebase on main or a re-run of those jobs should confirm. Once CI is green I'll come back and approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants