fix: tolerate /userinfo responses without an email claim#68
Conversation
InitiateBrowserBasedLogin previously treated a missing "email" field in the /userinfo response as fatal, aborting the entire OAuth flow even though the token exchange had already succeeded. In practice Notehub's /userinfo can return HTTP 200 containing only the OIDC bookkeeping claims (sub, iss, aud, iat, ...), which made every browser-based sign-in fail with "could not retrieve email" -- and because the handler does not check the HTTP status code, the actual response was hidden from the user. Fall back to the subject identifier when email is absent so the caller still receives a populated AccessToken. The Email field is used only as a display/persistence identifier, so the fallback is sufficient for downstream consumers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adjusts the browser-based OAuth login flow to tolerate /userinfo responses that omit the email claim by falling back to another identifier, preventing otherwise-successful logins from failing during the final userinfo parsing step.
Changes:
- Treat missing
emailin/userinfoas non-fatal. - Fall back to the OIDC
subclaim (and currently a hard-coded placeholder) to ensureAccessToken.Emailis populated.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses PR review feedback: - Check the /userinfo HTTP status code and fail with the body included if it isn't 200. Previously the handler went straight to JSON-decoding, so any non-200 response surfaced as a generic "could not retrieve email" with no detail. - Remove the "(oauth)" placeholder fallback. OIDC requires sub to be present in a userinfo response, so treating its absence as an error surfaces server misbehavior instead of masking it with a non-stable identifier. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses PR review feedback: errHandler writes its message into the localhost callback HTTP response without setting Content-Type, so a browser renders it as HTML. The previous change embedded the raw /userinfo body into that message, which could in principle execute injected markup from a hostile or misconfigured response. Split the audiences at the status-check callsite: the browser-facing response contains only the status code, while the returned Go error and the local log still include the full body for debugging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses PR review feedback: the response body is untrusted input and was being interpolated into the returned error and local log with %s, so an IdP that returned bytes like \x1b[ could have its escape sequences interpreted by the user's terminal when the diagnostic string was printed. Use %q so control characters are rendered as their printable escape forms instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…en status Addresses PR review feedback: errHandler writes its message into the localhost callback response with no Content-Type set, so a browser will render it as HTML by default. The previous round fixed only the /userinfo status-error callsite, but other callers (OAuth error_description from /oauth2/token, JSON unmarshal errors, ...) still pass server- controlled strings through the same Fprintf. Set Content-Type to text/plain at the errHandler level so the response is treated as text regardless of which callsite produced the message, eliminating the XSS class without touching individual sites. Separately, the /oauth2/token exchange did not check StatusCode -- a non-200 response with a non-JSON body fell into the generic "could not unmarshal" branch, hiding the underlying failure exactly the way the original /userinfo regression did. Surface the HTTP status and quoted body when unmarshal fails on a non-200 response. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the per-callsite error handling in InitiateBrowserBasedLogin with a small set of invariants, addressing the class of issues raised in review (XSS via the localhost callback, terminal escape-sequence injection, unbounded reads / log flooding, and missing status checks) rather than patching each path individually: - Single fail(summary, detail) sink: the browser callback receives only `summary` -- a constant, developer-authored string, never server-controlled data -- always as text/plain. No callsite writes to the ResponseWriter directly, so injected markup can never be reflected, regardless of which path fails. - safeDetail() truncates (to 1 KiB) and strconv.Quote()s every untrusted value before it reaches the log or returned error, so control characters can't be interpreted by a terminal and a hostile body can't flood output. - readResponseBody() bounds every response read to 1 MiB via io.LimitReader, preventing memory amplification at the source. - Both /oauth2/token and /userinfo are status-checked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Reworked the handler in 50bf924 to address this class of feedback structurally rather than per-callsite:
The invariant is now auditable in one place: every browser write is a constant; every fail detail arg is wrapped in safeDetail; every body read goes through readResponseBody. |
Addresses PR review feedback. The status check was only reached when JSON unmarshalling failed, so a non-200 response with a valid JSON body that lacked both "error" and "access_token" would slip past it and fall through to the generic "no access token" error, hiding the real HTTP status -- and a non-200 response that did contain a token would have been consumed. Check the status immediately after reading the body, before consuming any fields, mirroring the /userinfo path. Both endpoints now share the same read -> status-check -> parse -> consume structure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses PR review feedback: - Correct the fail() doc comment: `summary` is not purely a constant string -- some call sites include the numeric HTTP status code -- so describe it as developer-authored text plus non-injectable scalars rather than implying it never derives from the response at all. - On /oauth2/token and /userinfo JSON parse failures, include the unmarshal error alongside the response body so malformed or empty responses are easier to diagnose. The combined detail still goes through safeDetail, since Go's json errors can echo input bytes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses PR review feedback. Deadlock: fail() set accessTokenErr but never signaled the shutdown goroutine, so InitiateBrowserBasedLogin blocked forever on <-done for any error path -- the server only stopped when quit received a signal, which happened only on the success path. fail() now does a non-blocking send to quit so an unsuccessful callback shuts the server down and the function returns its error. (This is the hang that previously required a manual ^C on any sign-in failure.) Credential leak: the /oauth2/token error paths logged/returned the raw response body, which for a token endpoint can contain access_token / refresh_token / id_token if an IdP ever returns them on an error status. redactSensitive() masks known credential fields in a JSON body before it reaches the log or returned error. Applied to the /userinfo paths too for uniformity (a no-op there, since userinfo carries no tokens). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A review of the callback handler surfaced three robustness bugs beyond the deadlock fixed earlier: 1. The handler is registered on "/", so it matched any request -- favicon, prefetch, a bare visit to the root. Such a request carries no valid state, hit the "state mismatch" path, and (now that error paths trigger shutdown) tore the flow down before the real redirect arrived. classifyCallback() now ignores any request without an authorization code, answering it with 204 and leaving the flow running. 2. Concurrent or duplicate callbacks each wrote the shared result and signaled shutdown. A sync.Once now admits exactly one callback; duplicates get a benign response. 3. An OS interrupt during sign-in shut the server down with neither result set, so InitiateBrowserBasedLogin returned (nil, nil) and the caller dereferenced a nil token. It now returns an error when no callback completed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses PR review feedback on the diagnostic helpers: - redactSensitive: when the body cannot be parsed as JSON it can no longer be redacted field-by-field, yet a truncated/malformed token response may still contain a secret. Suppress the whole body if it mentions any credential-bearing field name rather than returning it raw, honoring the "must never be written" guarantee. - readResponseBody: read one byte past the cap and return an explicit "response exceeds limit" error instead of silently truncating, which otherwise surfaces later as a misleading parse failure and hides the real cause (too large vs. malformed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1e73cad to
45afd13
Compare
classifyCallback treated any request without an authorization code as "not the callback" and ignored it (204, no shutdown). But per RFC 6749 an authorization failure -- most commonly the user denying consent -- redirects to the callback with an "error" parameter and no code. That redirect was therefore ignored, the awaited code never arrived, and InitiateBrowserBasedLogin blocked until the user interrupted it, without ever reporting why. Classify a request as the callback when it carries either a code or an error, validate state in both cases, and surface the provider's error (and error_description) via fail() so the flow ends promptly with a meaningful message. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A callback whose state didn't match was routed through fail(), which consumed the single-callback slot and signaled shutdown. Because the callback ports are a predictable, hard-coded list, any local process -- or a webpage issuing requests to http://localhost:<port>/?... -- could reliably abort a legitimate sign-in (or burn the sync.Once so the real redirect got the "already completed" response). That's a local denial of service on the login flow. Handle a state mismatch before the sync.Once gate and respond benignly (400, no error recorded, no shutdown), so only a redirect whose state matches the value this attempt generated can consume the slot or end the flow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| // markup or scripts. `detail` (already passed through safeDetail by | ||
| // the caller) carries diagnostics to the local log and the returned | ||
| // Go error only. | ||
| fail := func(summary, detail string) { |
There was a problem hiding this comment.
this isn't something I actually care about, but this function feels like it'd be better as:
fail := func(format string, args ...any) {
msg := fmt.Sprintf(format, args...)
// ...
}There was a problem hiding this comment.
so you're trying to separate summary which will be sent back on the HTTP connection, and detail which will be included in the error returned from the top-level function. I think in the grand scheme of things both of those errors end up going to the same place: somewhere local on the user's machine. In one case, it's their web browser. In the other case it's the Notehub CLI (or whatever is calling this top-function)
That's a long way to say that I don't think we need to distinguish summary and detail as separate entities
There was a problem hiding this comment.
Claude says:
Right on both, and they're connected. The split exists for one reason: summary is written back over the HTTP connection to the browser (which renders it), while detail only goes to the local log and the returned Go error. The intent was to keep server-controlled bytes (an OAuth error_description, a response body) out of the browser-rendered response.
But your point is the sharper one: both outputs land on the same user's machine — one in their browser, one in their terminal — so there's no third party to protect from reflected content. The trust boundary I was implicitly drawing doesn't really exist here, so the distinction buys very little. Collapsing fail to a single fail(format string, args ...any) that writes the same message to both places would be simpler and just as safe in this context.
I'm glad you raised it, because it generalizes: this error-handling apparatus (safeDetail, redaction, the summary/detail split) grew largely in response to an automated reviewer flagging each place a server byte touches a sink — under a threat model where Notehub is treated as hostile. For a first-party CLI talking to Notehub over TLS via a loopback redirect, that's arguably the wrong model. I'm happy to right-size the whole thing to what you think is warranted — collapsing fail is a good first cut.
| // answer any duplicate benignly so a second request cannot race the | ||
| // shared result or signal shutdown twice. | ||
| handled := false | ||
| once.Do(func() { handled = true }) |
There was a problem hiding this comment.
I'm having a hard time understanding what this is protecting against. So the client code (like the Notehub CLI) launches this web server and wants to guard against multiple valid OAuth HTTP request hitting that endpoint? Does that actually happen in practice?
There was a problem hiding this comment.
Claude says:
Two things, one observed and one theoretical:
-
(observed) During testing the localhost callback received more than one request per sign-in — browsers issue favicon/prefetch/connection-reuse probes to
http://localhost:<port>/. Without a guard, a stray second request re-enters the handler and re-runs the token exchange / re-writes the shared accessToken/accessTokenErr. -
(theoretical) Two concurrent requests would race on those shared variables.
After the latest change (b2bede4), state-mismatched requests are rejected before the once gate, so once now only gates requests carrying our exact state. That narrows its real job to: if the browser hits the success callback twice with the same code+state (refresh/retry), process it once instead of running a second doomed exchange (the auth code is single-use anyway).
Honest assessment: it's defensive, not essential — a duplicate would fail harmlessly on its own. I leaned toward process-exactly-once for predictability and to kill the data race. If you'd rather drop it and let a duplicate fail naturally, that's reasonable; your call.
There was a problem hiding this comment.
browsers issue favicon/prefetch/connection-reuse probes to http://localhost:/. Without a guard, a stray second request re-enters the handler and re-runs the token exchange
But won't that those particular requests you mentioned be filtered out by classifyCallback returning callbackIgnore?
| fail("/userinfo response missing both email and sub", "") | ||
| return | ||
| } | ||
| email = sub |
There was a problem hiding this comment.
I'm confused about this because we control the endpoint that generates userinfoData. That's on our API server. Shouldn't that mean that we could effectively guarantee that the "email" field is present and that means we wouldn't have to look at "sub"?
There was a problem hiding this comment.
Claude says:
This is the most important question in the review, because it points at the actual root cause of the outage. The fallback exists because of an empirical observation: as of testing on 2026-06-05, the live /userinfo on api.notefile.net was returning HTTP 200 with no email field — only the standard OIDC claims:
{"aud":["notehub_cli"],"auth_time":1780680916,"iat":1780680916, "iss":"https://notehub.io","rat":1780680916, "sub":"55256f04-14e4-4c2c-8f1d-01f05f4dc688"}
The pre-existing code did email, ok := userinfoData["email"].(string); if !ok { errHandler("could not retrieve email") } and aborted — which is exactly why browser sign-in was failing for everyone with could not retrieve email.
You're right that the cleaner fix is server-side: if /userinfo reliably includes email, the client never needs sub. I'd actually flag that as the real bug to chase — it looks like /userinfo regressed and dropped email, and that's the true source of the outage. If that's restored server-side, this fallback becomes belt-and-suspenders.
I kept the client fallback for two reasons: (1) it unblocks the CLI without waiting on an API deploy, and (2) sub is the one claim OIDC requires userinfo to return, so it's a stable identifier if email is ever absent again. But it's genuinely optional — if you can confirm /userinfo will always carry email, I'm happy to drop the sub path entirely.
There was a problem hiding this comment.
/userinfo only returns an "email" field for OAuth tokens if that token has the email scope. Maybe the real fix here is to make it so /userinfo requires the token to have the email scope and we should make sure all of our existing clients include it.
I don't like saying email = sub because sub can be anything -- in our case it'd likely be an account UID, but we're telling the upstream code that it's an email address.
InitiateBrowserBasedLogin previously treated a missing "email" field in the /userinfo response as fatal, aborting the entire OAuth flow even though the token exchange had already succeeded. In practice Notehub's /userinfo can return HTTP 200 containing only the OIDC bookkeeping claims (sub, iss, aud, iat, ...), which made every browser-based sign-in fail with "could not retrieve email" -- and because the handler does not check the HTTP status code, the actual response was hidden from the user.
Fall back to the subject identifier when email is absent so the caller still receives a populated AccessToken. The Email field is used only as a display/persistence identifier, so the fallback is sufficient for downstream consumers.