feat(hub): native companion (FCM) push channel + device registry + pairing QR#803
feat(hub): native companion (FCM) push channel + device registry + pairing QR#803heavygee wants to merge 14 commits into
Conversation
Adds opt-in FCM HTTP v1 notification delivery so a companion mobile/wearable
app can receive permission, ready, and task notifications end-to-end. The
channel is gated entirely on FCM_SERVICE_ACCOUNT_PATH + FCM_PROJECT_ID being
set; operators not running a companion see zero behavior change.
What lands:
- POST/DELETE /api/devices/register — JWT-authed FCM token registry,
upsert on (namespace, deviceId, platform), platforms `phone` | `wear`.
- Sqlite v9 → v10 migration adds `fcm_devices` (idx on namespace + token).
- FcmService — minimal HTTP v1 client, RS256 service-account JWT via
jose (dep already in tree), 5-minute access-token cache, 401 retry.
- FcmNotificationChannel — implements NotificationChannel, sends data-only
FCM (so companion can route to phone+watch surfaces). Body composition
parses an optional trailing `AGENT_NOTIFY_SUMMARY {json}` line for richer
ready summaries; truncates plain assistant text to 280 chars otherwise.
Tags each payload with `severity` (info/warning/success/error) so clients
can color/categorise the notification.
- PushNotificationChannel gains a NativeFallbackProbe — when a namespace
has at least one registered FCM device, web-push and SSE in-page toast
are skipped so the operator does not double-notify on phone+browser.
Probe is no-op when no FCM device is registered; PWA-only setups
unchanged. Branch trace gated on HAPI_NOTIFY_DEBUG=1.
- shared/src/messages.ts — `extractAssistantPlainText` (codex + Claude SDK
shapes) and `extractNotifySummary` (strict end-anchored line parser).
- hub/src/notifications/toolArgs.ts — tool-arg formatters lifted out of
telegram/sessionView (kept duplicated there in this PR; refactor of
Telegram is a follow-up).
- docs/api/native-companion-contract.md — payload + endpoints + env vars,
versioned at contract v1.
Test coverage:
- 260 hub tests pass (incl. 23 new across FCM channel, push dedup,
v10 migration, devices route).
- 60 shared tests pass (messages parsers).
Notes for reviewers:
- Reference companion implementation lives in a separate Android repo
(Kotlin, phone APK + Wear OS APK) — this PR is hub-side only.
- No new runtime deps (`jose` and `zod` already declared in hub).
Co-authored-by: Cursor <cursoragent@cursor.com>
…ub-on-phone Adds a Scope section to the native-companion contract so anyone implementing it knows the audience: operators running the hub on a server who want phone/watch as a notification surface, not users expecting a Termux-bundled hub. Mirrors the framing now in heavygee/hapi-companion README. Co-authored-by: Cursor <cursoragent@cursor.com>
Removes the prior framing that referenced a non-existent 'Termux hub-on-phone' alternative. This contract describes a native client to the same hub the PWA talks to; it does not change where the hub runs. Co-authored-by: Cursor <cursoragent@cursor.com>
Companion section in Settings renders a QR code encoding the deeplink hapicompanion://bind?hub=<base>&code=<token>. Scanning it from the HAPI companion app (Android phone or Wear OS) auto-fills the bind form and authenticates against this hub - no manual URL/token paste. QR is gated behind a Show button so the access token doesn't sit visible on screen by default; a Copy link affordance and the textual deeplink are also exposed for manual onboarding. Adds qrcode + @types/qrcode to web/ (already a hub dep, no new resolved package - just a workspace declaration). Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Major] Web push is suppressed even when no FCM channel is active —
nativeFallbackProbereturns true solely from stored device rows, but the FCM notification channel is only registered whenresolveFcmConfig()succeeds. If an operator has anfcm_devicesrow and then runs the hub withoutFCM_SERVICE_ACCOUNT_PATH/project config,PushNotificationChannelreturns early and no FCM channel exists to deliver anything, so ready/permission/task notifications disappear. Evidencehub/src/startHub.ts:206.
Suggested fix:const fcmConfig = resolveFcmConfig() const nativeFallbackProbe = (namespace: string): boolean => Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0 const notificationChannels: NotificationChannel[] = [ new PushNotificationChannel( pushService, sseManager, visibilityTracker, config.publicUrl, nativeFallbackProbe ) ]
Summary
- Review mode: initial
- Found one notification delivery regression in the FCM/web-push fallback interaction.
Testing
- Not run (automation)
HAPI Bot
After the existing PWA access QR is rendered on tunnel start, also print the hapicompanion://bind?hub=...&code=... deeplink and a matching QR. Same tunnel + token, different scheme: phones with the companion app installed pick up the deeplink via the manifest intent filter; phones without it ignore it and fall back to the PWA QR above. QR rendering failure is non-fatal in both cases - the textual deeplink above the QR is sufficient for manual paste. Co-authored-by: Cursor <cursoragent@cursor.com>
9179ca9 to
ae33e74
Compare
There was a problem hiding this comment.
Findings
- [Major] Web push is suppressed when FCM is not configured —
nativeFallbackProbeonly checks stored device rows, but the FCM channel is registered only whenresolveFcmConfig()succeeds. A stale companion row plus a hub started withoutFCM_SERVICE_ACCOUNT_PATHmakesPushNotificationChannelreturn before SSE/web-push while no FCM channel exists to deliver the notification. Evidencehub/src/startHub.ts:207.
Suggested fix:const fcmConfig = resolveFcmConfig() const nativeFallbackProbe = (namespace: string): boolean => Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0
- [Major] Transient FCM failures delete registered devices —
sendToToken()returnsfalsefor every non-OK response, butsendToNamespace()treats everyfalseas an invalid token and removes it fromfcm_devices. A 429/500/503 from Google or a temporary auth/project problem can permanently unregister the phone/watch, after which web-push also stays suppressed because registration state changed unexpectedly. Evidencehub/src/fcm/fcmService.ts:73.
Suggested fix:type FcmTokenSendResult = 'sent' | 'invalid' | 'failed' const result = await this.sendToToken(accessToken, device.token, payload, device.platform) if (result === 'sent') { sent += 1 return } failed += 1 if (result === 'invalid') { invalidTokens.push(device.token) this.store.fcm.removeDeviceByToken(namespace, device.token) }
Summary
- Review mode: follow-up after new commits
- Prior bot finding is still present at the current head, and the FCM send path adds a separate device-registration data-loss risk.
Testing
- Not run (automation). Missing coverage for stale
fcm_devicesrows when FCM config is absent, and for retaining devices on transient FCM 429/5xx responses.
HAPI Bot
Two bugs surfaced by the upstream review bot:
1) Web Push silently dropped when FCM is not actually configured.
The native-fallback probe only checked the device registry; it did
not check whether resolveFcmConfig() actually succeeded. So an
operator who previously enabled FCM, registered a phone, then later
started the hub WITHOUT FCM_SERVICE_ACCOUNT_PATH would see the probe
return true (devices still in DB) -> Web Push suppressed -> no FCM
channel registered -> notifications go to /dev/null.
Fix: extracted the probe construction into buildNativeFallbackProbe()
which short-circuits to () => false when fcmConfig is missing. Probe
never even consults the device store in the no-config branch, so
stale rows can never matter.
2) Transient FCM failures permanently unregistered devices.
sendToToken() returned a single boolean and sendToNamespace() removed
any device whose send returned false. A 429 (rate limit), 503
(server error), 401 (auth glitch), or even an ECONNREFUSED would
delete the device row, after which the user would need to re-pair to
get notifications again. The bot caught it; the fix is the obvious
one.
Fix: sendToToken() now returns 'sent' | 'invalid' | 'failed'.
- 'invalid' is reserved for the responses that genuinely indicate a
dead token: HTTP 404 with UNREGISTERED/NOT_FOUND, and HTTP 400
with INVALID_ARGUMENT explicitly referencing the token field.
- Everything else (429, 5xx, 401, 403, network errors) is 'failed'
and counts toward the failed tally without removing the device.
sendToNamespace() only calls removeDeviceByToken() on 'invalid'.
Tests: 11 new tests across two new files. fcmService.test.ts covers
all six branches (200, 404 unregistered, 429, 503, 401, network error)
plus a mixed-batch case that proves invalid tokens get removed in the
same call where transient-failure tokens survive. nativeFallbackProbe
.test.ts covers both no-config and configured branches plus the
explicit "no-config never touches the store" guarantee.
Hub test count: 273 -> 284 (all passing).
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Both findings addressed in Finding 1 (web-push gated on fcmConfig): extracted the probe construction into Finding 2 (transient FCM errors no longer unregister devices): 11 new tests across Thanks to the bot for catching both - especially the device-unregistration one, which would have been a really mysterious "I have to re-pair every couple of days" bug in production. |
There was a problem hiding this comment.
Findings
- [Minor] Contract documents the wrong FCM visibility rule — the docs say FCM only sends when web push would fire and the PWA is not visible, but
FcmNotificationChannel.deliver()always calls FCM and the new test asserts delivery even with a visible PWA. Companion apps built from this contract can suppress foreground handling and miss notifications. Evidencedocs/api/native-companion-contract.md:46.
Suggested fix:Hub sends FCM HTTP v1 whenever a notification event is emitted for a namespace with registered native devices and FCM is configured. Web Push is suppressed for that namespace to avoid duplicate OS notifications.
- [Minor] Contract advertises an unsupported
session-completedevent —NotificationHubonly calls channels withsendSessionCompletion, and the newFcmNotificationChanneldoes not implement it, so native companions will never receive the documentedsession-completedtype. Evidencedocs/api/native-companion-contract.md:52.
Suggested fix:| `type` | `ready` | `ready`, `permission-request`, `task-notification` |
Summary
- Review mode: follow-up after new commits
- Prior bot findings are addressed at the current head. Remaining issues are contract accuracy gaps in the new native companion docs.
Testing
- Not run (automation):
bunis not installed in this runner (bun: command not found).
HAPI Bot
…ent type HAPI Bot review on PR tiann#803 caught two contract-doc accuracy gaps: 1) Visibility rule was wrong. Doc said "FCM fires when Web Push would fire AND client not visible via SSE", but FcmNotificationChannel ALWAYS fires regardless of PWA visibility (deliberately - native companion is the canonical wrist-first surface, and there is a passing test asserting this). Companion app implementers reading the contract would have built foreground-suppression logic and then dropped notifications when the PWA tab was open. 2) Documented `session-completed` event doesn't exist. NotificationHub never calls into a 'session-completed' channel method on FcmNotificationChannel; the type would never reach a native client. Removed from the documented enum, leaving only the three actual events: ready, permission-request, task-notification. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Nit] Remove trailing whitespace —
git diff --check origin/main...HEADfails because the new Audience line ends with Markdown hard-break spaces. Evidencedocs/api/native-companion-contract.md:3.
Suggested fix:**Audience:** Implementers of native companion apps (Android phone + Wear OS, iOS, etc.) that pair with a hapi hub via FCM. **Auth:** Same JWT as the web client (`POST /api/bind` → `Authorization: Bearer`).
Summary
Review mode: follow-up after new commits
Prior bot findings are addressed at the current head. One whitespace issue remains from the latest full diff review.
Testing
Not run (automation): bun is not installed in this runner (command -v bun returned non-zero). Static check run: git diff --check origin/main...HEAD fails as noted above.
HAPI Bot
…h break Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Major] Persist bind token before rendering the pairing QR —
CompanionPairingonly builds the deeplink fromhapi_access_token::<baseUrl>in localStorage, but Telegram binding authenticates viabind()and never stores the CLI token entered on the bind screen (web/src/App.tsx:364,web/src/hooks/useAuth.ts:147). A Telegram user can successfully bind the Mini App, open Settings, and still hit the “signed in via Telegram...” fallback, so the new Settings companion pairing QR is unavailable for a primary entry path. Evidenceweb/src/components/settings/CompanionPairing.tsx:31.
Suggested fix:// web/src/hooks/useAuth.ts const ACCESS_TOKEN_PREFIX = 'hapi_access_token::' function rememberAccessToken(baseUrl: string, accessToken: string): void { try { localStorage.setItem(`${ACCESS_TOKEN_PREFIX}${baseUrl}`, accessToken) } catch { // Ignore storage errors } } // inside bind(), after successful client.bind(...) rememberAccessToken(baseUrl, accessToken)
Summary
Review mode: follow-up after new commits
Prior whitespace finding is fixed at the current head. One functional issue remains in the Settings companion pairing flow.
Testing
Not run (automation): bun is not installed in this runner (command -v bun returned no path). Static check: git diff --check origin/main...HEAD passed.
HAPI Bot
…works The Settings -> Companion pairing QR reads the original CLI access token from localStorage (hapi_access_token::<baseUrl>) so it can be encoded into the hapicompanion://bind deeplink. For browser/CLI logins useAuthSource already persists the token via setAccessToken, but the Telegram Mini App bind path went through useAuth.bind() which exchanged the typed CLI token for a JWT and never persisted it. Telegram users therefore always saw the "signed in via Telegram..." fallback and got no usable QR. After a successful client.bind() we now mirror useAuthSource's behavior and write the same accessToken to the same localStorage key, restoring parity between the two auth paths. No change for browser/CLI users. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Major] Do not suppress Web Push before FCM actually succeeds — the new native fallback probe skips the PWA/SSE path whenever a device row exists, but FCM delivery can still fail transiently or during OAuth token exchange.
FcmService.sendToNamespace()explicitly treats 401/429/5xx/network errors as failures while keeping the device registered, andgetFcmAccessToken()can throw before any send. In those cases this branch returns before Web Push/SSE runs, so permission/ready/task notifications disappear for the namespace. Evidencehub/src/push/pushNotificationChannel.ts:124, related FCM failure pathshub/src/fcm/fcmService.ts:70andhub/src/fcm/fcmService.ts:144.
Suggested fix:// Make native delivery own the fallback decision instead of suppressing // Web Push from a separate channel based only on registration. private async sendNativeFirst( session: Session, fcmPayload: FcmSendPayload, sendWebFallback: () => Promise<void> ): Promise<void> { const result = await this.fcmService.sendToNamespace(session.namespace, fcmPayload) .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] })) if (result.sent === 0) { await sendWebFallback() } }
Questions
- None.
Summary
- Review mode: follow-up after new commits
- One Major issue remains: native registration can suppress all web notification fallbacks even when FCM does not deliver. Prior Telegram bind-token finding appears fixed in
web/src/hooks/useAuth.ts.
Testing
- Not run (automation):
bunis not installed in this runner. - Static check:
git diff --check origin/main...HEADpassed.
HAPI Bot
The native-fallback probe previously returned true whenever FCM was
configured AND devices were registered, which suppressed web-push for
the namespace. The HAPI Bot correctly pointed out the gap: if the FCM
pipeline silently breaks (expired service-account key, sustained 5xx,
OAuth token-fetch failure, network blackhole) the operator gets nothing
on either channel until they manually intervene.
Approach (deliberate, not the bot's exact suggested fix):
- FcmService now keeps a small rolling window (last 8 outcomes) of send
attempts and exposes `isHealthy()`. The threshold is 5+/8 failures =
unhealthy; the buffer starts empty so a freshly-booted hub is
optimistic ("innocent until proven guilty") and does not double-fire
on event #1.
- Token-fetch failure (`getFcmAccessToken` throws) now records exactly
one health-failure (not one per device), short-circuits the send
loop, and returns a result so `sendToNamespace` no longer leaks the
exception.
- `invalid` token responses are explicitly excluded from the health
buffer because they are per-device facts (rotated/uninstalled token),
not pipeline failures - FCM was reachable, it just rejected one
stale token.
- `buildNativeFallbackProbe` now optionally accepts the FcmService and
short-circuits to "let web-push fire" when health is bad, before it
even queries the device registry. The single-arg call shape is still
supported for back-compat.
Why not the bot's exact suggestion ("invert: call FCM first, fall back
on result.sent === 0"):
- Couples PushNotificationChannel to FcmService and FcmSendPayload,
reversing the clean parallel-channel architecture established earlier
in this PR.
- Treats every transient single-event failure as fallback-worthy, which
re-opens the duplicate-notification race that the suppression logic
was added to close (FCM HTTP timeout that delivers later + the web
push we sent in the meantime = two pings).
- A rolling health window only flips on sustained breakage, which is
the actual operational scenario the bot is worried about.
The wrist-first design intent ("FCM fires unconditionally, web-push is
suppressed for the same namespace") documented in
docs/api/native-companion-contract.md is preserved on the happy path.
The probe only re-enables web-push when there is concrete evidence the
native pipeline is not delivering.
Tests:
- New FcmService.isHealthy suite covers empty-buffer, threshold flip,
recovery as failures age out of the window, invalid-token exclusion,
and network-error path.
- nativeFallbackProbe gains coverage for the unhealthy-but-registered,
healthy-and-registered, and absent-fcmService (back-compat) cases.
- All 292 hub tests still pass; typecheck clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Major] FCM health gate still drops first failed native notifications —
PushNotificationChannelreturns before the FCM channel has attempted delivery, while the new health gate starts healthy and stays healthy until five recorded failures. After a hub restart or credential/OAuth/network outage, the first several permission/ready/task notifications for a namespace with a registered device still skip SSE/Web Push and then fail in FCM, so the operator receives nothing. Evidencehub/src/push/pushNotificationChannel.ts:76, related health thresholdhub/src/fcm/fcmService.ts:85.
Suggested fix:private async deliverNativeFirst( session: Session, webPayload: PushPayload, fcmPayload: FcmSendPayload ): Promise<void> { const result = await this.fcmService .sendToNamespace(session.namespace, fcmPayload) .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] })) if (result.sent > 0) return await this.deliverWebOrToast(session, webPayload) }
Questions
- None.
Summary
- Review mode: follow-up after new commits
- One Major issue remains: the new health probe reduces prolonged notification loss, but it still suppresses web/SSE fallback for the first failed FCM notifications after startup or a fresh outage.
Testing
- Not run (automation):
bunis not installed in this runner. - Static check:
git diff --check origin/main...HEADpassed.
HAPI Bot
Status on
|
| Scenario | Bot's fix (sent === 0 -> fallback) |
Health probe (5+/8 failures -> fallback) |
Current PR (probe + health) |
|---|---|---|---|
| FCM happy path | ✓ no duplicates | ✓ no duplicates | ✓ no duplicates |
| FCM 503 burst (transient) | every event misses+fallback fires; if FCM later catches up = duplicate | first ~5 events missed, then web-push covers | first ~5 events missed, then web-push covers |
| FCM credentials expired | every event has fallback; but if FCM HTTP times out and Google retries it later -> duplicate | first ~5 events missed, then web-push covers | first ~5 events missed, then web-push covers |
| FCM cold start, no history | always tries FCM first (sequential) -> latency on every event | optimistic, no first-event penalty | optimistic, no first-event penalty |
| Architectural cost | PushNotificationChannel now imports FcmService + FcmSendPayload; channels lose independence |
one shared isHealthy() signal; channels stay parallel |
one shared isHealthy() signal; channels stay parallel |
There is no design that gives both "no first-failure missed" and "zero duplicates on FCM HTTP-timeout-then-retry-delivers." The wrist-first contract documented in docs/api/native-companion-contract.md was an explicit choice to prioritise no-duplicates over no-misses. The health probe softens the "credentials silently expired" worst case (which was the bot's original concern) without giving up the duplicate-free guarantee on the happy path.
Two questions for @tiann
- Are you OK with the current trade-off (~5 missed events on a sustained outage, never duplicate on transient timing weirdness), or would you rather take the bot's invert-and-fall-back approach (no first-event miss, occasional duplicate on FCM-timeout-then-Google-retries)?
- If you prefer the latter, happy to refactor channel orchestration up to
NotificationHubsoPushNotificationChanneldoesn't grow an import onFcmService.
CI is green either way. Defer to your call.
…push-api Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # web/src/routes/settings/index.tsx
Status on
|
There was a problem hiding this comment.
Findings
- [Major] Native fallback still drops notifications on initial FCM failures —
PushNotificationChannelsuppresses both SSE toast and Web Push as soon as the native probe returns true, before the FCM channel has attempted this notification. The probe starts healthy and remains healthy until five failures, so after a hub restart with bad credentials/OAuth/network outage, the first failed ready/permission/task notifications still return here and FCM only records the failure later in the next channel pass. Evidencehub/src/push/pushNotificationChannel.ts:76, related health gatehub/src/fcm/fcmService.ts:85and channel orderhub/src/startHub.ts:227.
Suggested fix:type NativeDelivery = (namespace: string, payload: PushPayload) => Promise<{ sent: number }> private async deliverNativeFirst( session: Session, webPayload: PushPayload, nativePayload: PushPayload, deliverNative?: NativeDelivery ): Promise<void> { if (deliverNative) { const result = await deliverNative(session.namespace, nativePayload).catch(() => ({ sent: 0 })) if (result.sent > 0) return } await this.deliverWebOrToast(session, webPayload) }
Questions
- None.
Summary
- Review mode: follow-up after new commits
- One Major issue remains from the prior bot pass: the health probe helps after enough FCM failures have been recorded, but the current push channel can still mute all non-native surfaces for the first failed notifications after startup or a fresh outage.
Testing
- Not run (automation):
bunis not available in this runner PATH. - Static check:
git diff --check origin/main...HEADpassed.
HAPI Bot
…dule The Telegram session view had its own copy of formatToolArgumentsDetailed identical to the one in hub/src/notifications/toolArgs.ts (already used by the FCM channel). Replace the local copy with an import. Removes ~70 lines of duplication, plus the now-unused MAX_TOOL_ARGS_LENGTH constant and `truncate` import. The shared signature accepts an optional opts arg whose default maxArgLength is 150 - matching the prior constant - so the call site is unchanged. Two benign upgrades come along for the ride from the shared module: ?? instead of || on field fallbacks (no real-world difference; permission arguments never carry empty-string fields), and String(...) wrapping plus a typeof object guard that makes non-string values render gracefully instead of throwing into the catch block. Hub tests: 311 pass / 0 fail. Telegram subset: 5 pass / 0 fail. typecheck green. Cold-reviewed by an out-of-context Claude Opus peer before push. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Major] Native fallback still drops notifications on initial FCM failures —
PushNotificationChannelsuppresses SSE/Web Push before the FCM channel attempts the same notification. The probe is treated as healthy with an empty failure window, so after a hub restart with bad credentials/OAuth/network outage, this branch returns first and the later FCM channel only records the failure after all user-visible fallback surfaces have already been skipped. Evidencehub/src/push/pushNotificationChannel.ts:76, related channel orderhub/src/startHub.ts:227and empty-buffer health gatehub/src/fcm/fcmService.ts:85.
Suggested fix:type NativeDelivery = (namespace: string, payload: FcmSendPayload) => Promise<{ sent: number }> private async deliverNativeFirst( session: Session, nativePayload: FcmSendPayload, webPayload: PushPayload, deliverNative?: NativeDelivery ): Promise<void> { if (deliverNative) { const result = await deliverNative(session.namespace, nativePayload).catch(() => ({ sent: 0 })) if (result.sent > 0) return } await this.deliverWebOrToast(session, webPayload) }
Questions
- None.
Summary
- Review mode: follow-up after new commits
- One Major issue remains from the prior bot pass: the new health probe helps after enough failures have been recorded, but it still mutes all non-native surfaces for the first failed native notifications after startup or a fresh outage.
Testing
- Not run (automation/security policy: no PR code execution). Static check:
git diff --check origin/main...HEADpassed.
HAPI Bot
…ng web-push Addresses HAPI Bot Major review on PR tiann#803. The previous health gate treated an empty outcome buffer as healthy ("innocent until proven guilty"). That created a silent-blackhole window on cold start with broken FCM credentials: the push channel suppressed SSE/Web Push for the first ~5 events while the FCM channel attempted each delivery and recorded failures, until enough stacked to flip the threshold. Every notification in that gap was silently lost. New invariant: isHealthy() requires at least one successful FCM send in the recent window (HEALTH_WINDOW=8) AND failures below threshold (HEALTH_FAILURE_THRESHOLD=5). Both conditions are necessary; either alone is insufficient evidence to safely suppress web-push fallback. Trade-off: one duplicated notification per hub restart per namespace. On the first event after restart, web-push fires alongside FCM (because the gate has no positive evidence yet). Once FCM records that first success, the gate engages and subsequent events are FCM-only. Worth it for guaranteed delivery during cold-start outages. Tests reworked to match new semantics: - "starts UNHEALTHY with empty buffer" (was: healthy) - "flips to healthy after first successful send" (new) - "stays unhealthy across failures-only run" (new, exercises the exact blackhole scenario the bot flagged) - "flips back to unhealthy after threshold breach with prior successes" (renamed, establishes successes first) - "invalid tokens don't count against health" (reworked: send a mixed batch first to establish health, then verify invalids don't flip it) - "network errors count as failures" (reworked: establish health first) Hub tests: 313 pass / 0 fail. typecheck green. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Findings
- [Major] Native fallback still drops notifications during a fresh FCM outage after a prior success —
PushNotificationChannelsuppresses SSE/Web Push before the FCM send for the same notification. The new cold-start guard helps whileFcmServicehas no successes, but after any successful send,isHealthy()stays true until five failures accumulate, so the first failed permission/ready/task notifications in a new OAuth/network/5xx outage are skipped by web-push athub/src/push/pushNotificationChannel.ts:76and then fail in FCM. Evidence:hub/src/push/pushNotificationChannel.ts:76,hub/src/fcm/fcmService.ts:99.
Suggested fix:type NativeDelivery = (namespace: string, payload: FcmSendPayload) => Promise<{ sent: number }> private async deliverNativeFirst( session: Session, nativePayload: FcmSendPayload, webPayload: PushPayload, deliverNative?: NativeDelivery ): Promise<void> { if (deliverNative) { const result = await deliverNative(session.namespace, nativePayload).catch(() => ({ sent: 0 })) if (result.sent > 0) return } await this.deliverWebOrToast(session, webPayload) }
Questions
- None.
Summary
- Review mode: follow-up after new commits
- One Major issue remains: the cold-start gap from the previous pass is covered, but the same pre-send suppression still drops notifications during a fresh FCM outage after the service has previously been healthy.
Testing
- Not run (automation environment:
bunis not installed). Static check:git diff --check origin/main...HEADpassed.
HAPI Bot
Status on
|
Summary
Adds an opt-in, contract-only path for native companion apps (Android phone + Wear OS) to register for FCM push and receive the same notifications HAPI already sends via Web Push - without changing default behavior for any existing user.
Companion to discussion #746.
The companion app source itself is not in this PR; it lives at heavygee/hapi-companion (currently dogfood, eventual home is wherever the maintainer prefers - happy to transfer / mirror). This PR is just the hub-side surface the app talks to, plus a PWA Settings UI + terminal QR for pairing.
Hub additions
POST /api/devices/register+DELETE /api/devices/register- FCM token registration scoped per (namespace, deviceId, platform).platformisphoneorwear.hub/src/fcm/- optionalFcmService+FcmNotificationChannel, gated entirely byFCM_SERVICE_ACCOUNT_PATHenv. If unset, no FCM code paths run, no behavior changes.fcm_devicestable (id, namespace, token, platform, deviceId, timestamps, unique on (namespace, deviceId, platform)).shared/messages.ts-extractAssistantPlainText()andextractNotifySummary()helpers used by the FCM channel to build human-readable notification bodies (and parse optionalAGENT_NOTIFY_SUMMARY {...}JSON tail for richer summaries when agents emit it).docs/api/native-companion-contract.md- documents the data field schema, REST endpoints, severity values, and env knobs.PWA additions
hapicompanion://bind?hub=<base>&code=<token>. QR is gated behind a Show button so the access token isn't always on screen. Copy-link + textual deeplink are also exposed.qrcodetoweb/(already a hub dep; bun lockfile diff is just the workspace declaration, no new resolved packages).Behavior gates / opt-in story
The Web Push suppression is the only existing-user-visible change, and only triggers when an FCM device is registered. Happy to gate it behind an additional config flag (e.g.
HAPI_PUSH_PREFER_NATIVE) if you'd prefer Web Push remain literally untouched until the operator opts in twice. Disclosed up front so it's not a surprise.What this PR is not
FCM_SERVICE_ACCOUNT_PATHpoints at it. Distribution-tier discussion lives in the companion repo'sdocs/distribution-spectrum.md.Test plan
bun run typecheckpasses (hub, web, shared)bun run testpasses - 273 hub tests, 736 web tests, no regressionshub/src/store/migration-v10.test.ts)hub/src/web/routes/devices.test.ts)hub/src/fcm/fcmNotificationChannel.test.ts)hub/src/push/pushNotificationChannel.test.ts)extractAssistantPlainText+extractNotifySummaryunit-tested (shared/src/messages.test.ts)FCM_SERVICE_ACCOUNT_PATHunset, no FCM code paths execute (please confirm at your leisure)Companion app status
Going forward (out of PR scope, your call)