Skip to content

Fix Sentry SF50-TOLD-20 (~14s launch hang) and SF50-TOLD-29 (ENOSPC mis-detection)#7

Merged
RISCfuture merged 7 commits into
mainfrom
fix/sentry-20-and-29
May 18, 2026
Merged

Fix Sentry SF50-TOLD-20 (~14s launch hang) and SF50-TOLD-29 (ENOSPC mis-detection)#7
RISCfuture merged 7 commits into
mainfrom
fix/sentry-20-and-29

Conversation

@RISCfuture
Copy link
Copy Markdown
Contributor

Summary

Fixes two production Sentry issues, each independently developed and integrated together (full unit suite green on both).

SF50-TOLD-20 — Fixes SF50-TOLD-20 — ~14s fully-blocked main thread (AppHang) at launch.
NavDataLoaderViewModel (@MainActor) still performed synchronous SwiftData work on container.mainContext (init → recalculate, the .schemaVersion observer, load()'s completion block, and clearCycles()). While the NavDataLoader @ModelActor holds the persistent store coordinator during the multi-batch nav-data import, those main-context accesses block in -[NSManagedObjectContext performBlockAndWait:] waiting for store-coordinator ownership — exactly the reported stack. Prior fixes (5093289, c15f8ca) addressed adjacent paths but missed these. All view-model SwiftData reads/writes now run off-main via ModelContext(container) / the model actor; load() only writes UserDefaults on the main actor. Intent of the earlier fixes preserved; data integrity unchanged.

SF50-TOLD-29 — Fixes SF50-TOLD-29 — out-of-disk-space mis-classified as decompression failure.
Error.isOutOfDiskSpace's string fallback checked localizedDescription, which for StreamingLZMA.LZMAError.internalError returns the localized "Internal Error" — not the errno text. ENOSPC was therefore thrown as .decompressionFailed: reported to Sentry as noise and shown to users as "Check your internet connection and try again." It now also inspects String(describing:) / failureReason, where the "…No space left on device" text actually lives, so out-of-space is correctly suppressed from Sentry and given the "free up space; the download will resume" recovery suggestion. Typed NSPOSIXErrorDomain / POSIXError / NSUnderlyingErrorKey checks retained.

Test Plan

  • xcodebuild test -scheme "SF50 TOLD" -testPlan "Unit Tests" on iPhone 17 sim — 10/10 pass (6 TerrainDataLoaderErrorTests + 4 NavDataStateHelperTests)
  • swift format + swiftlint clean on all changed files; no new compiler warnings
  • NavDataStateHelperTests pins the off-main state-computation contract (regression guard for a return to a main-context fetch)
  • TerrainDataLoaderErrorTests covers ENOSPC via LZMAError, typed POSIX ENOSPC, non-ENOSPC POSIX, and .outOfDiskSpace non-reportability
  • Manual: confirm no AppHang on cold launch with a pending nav-data cycle (device)
  • Manual: confirm out-of-storage during terrain decompress shows the storage recovery suggestion and is absent from Sentry

🤖 Generated with Claude Code

RISCfuture and others added 3 commits May 16, 2026 23:30
LZMAError.internalError flattens errno to a string message, so its
localizedDescription returns "Internal Error" (not the errno text),
causing isOutOfDiskSpace to miss the out-of-disk-space condition.
The string fallback now also checks String(describing:) and failureReason,
where the errno message ("Failed to write: No space left on device")
actually lives. Fixes SF50-TOLD-29.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes SF50-TOLD-20: ~14s fully-blocked main thread (AppHang) firing
shortly after launch on 3.5.3+44.

Root cause: NavDataLoaderViewModel (an @observable @mainactor class)
still performed synchronous SwiftData operations on
container.mainContext from the MainActor:

- init() called recalculate() -> NavDataStateHelper.fetchState() ->
  mainContext.fetch (invoked from ContentView.onAppear at launch)
- the .schemaVersion observer task called recalculate() on
  mainContext, firing exactly when load() set Defaults[.schemaVersion]
  immediately after the bulk import
- load()'s completion block did mainContext.insert/save and a
  synchronous recalculate() on mainContext right after the import
- clearCycles() deleted + saved on mainContext

While the NavDataLoader @Modelactor holds the persistent store
coordinator during the multi-batch nav-data import (resetData()
cascade deletes plus tens of thousands of inserts), any of these
mainContext accesses block in -[NSManagedObjectContext
performBlockAndWait:] waiting for store-coordinator ownership,
matching the Sentry stack (_dispatch_main_queue_drain ->
swift_job_runImpl -> performBlockAndWait -> __DISPATCH_WAIT_FOR_QUEUE__).
Commit c15f8ca moved the polling loop off-main and 5093289 removed
the synchronous launch purge, but these synchronous mainContext paths
remained.

Fix (extends, does not regress, c15f8ca/5093289 intent):
- init() no longer fetches synchronously; loader state is computed
  exclusively off-main via ModelContext(container) by the existing
  detached poller (first iteration runs immediately) and a detached
  schema-version observer, applied back on the MainActor.
- Cycle persistence and the pre-load Cycle reset moved onto the
  NavDataLoader @Modelactor (writeCycles/clearCycles) so they run on
  its background context, never the main context.
- load() now only writes cheap UserDefaults values on the MainActor;
  no SwiftData touches the main context anywhere in the view model.

Added NavDataStateHelperTests: exercises the off-main state
computation (NavDataStateHelper.fetchState) against a background
ModelContext(container) across empty / current / expired-cycle /
stale-schema scenarios, pinning the contract the fix relies on so a
regression back to a main-context fetch is caught. A true unit test
of the watchdog hang itself isn't practical (it requires real
store-coordinator contention during a multi-minute network import),
so coverage targets the extracted off-main unit instead.

Wired the existing SF50 TOLDTests/Unit Tests.xctestplan into the
SF50 TOLD scheme so the SF50 TOLDTests target's unit tests run.

Verified: build-for-testing succeeded; NavDataStateHelperTests 4/4
passed; SF50 Shared Unit Tests plan fully green; swift-format and
swiftlint clean on changed files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able

Removes the test-only `extension TerrainDataLoaderError: Equatable`
(retroactive conformance of an imported type to an imported protocol,
which the compiler warns about) in favor of an `if case` pattern match
for the single equality assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
insertCycle(cycles.nasr, source: .nasr)
insertCycle(cycles.cifp, source: .cifp)
insertCycle(cycles.dof, source: .dof)
try? modelContext.save()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to swallow errors here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No — fixed in ef1fe35. writeCycles no longer uses try?; it is now throws and propagates through load(), so a failed Cycle save surfaces and is captured by the existing Sentry catch in NavDataLoaderViewModel instead of silently leaving cycle metadata unwritten. Consistent with the sibling clearCycles().

/// `StreamingLZMA` wraps the underlying `errno` string in a String-typed
/// `LZMAError.internalError` case, so the string fallback is needed when
/// the typed POSIX error has been lost.
/// `LZMAError.internalError` case, so a string fallback is needed when
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingLZMA lives at ~/Repositories/Libraries/StreamingLZMA -- if we can fix this there and cut a new version, that would be preferred

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at the source: RISCfuture/StreamingLZMA#1 adds LZMAError.ioFailure(operation:code:) carrying the raw errno, and conforms LZMAError/XZError to CustomNSError so an out-of-space failure bridges to an NSError with an NSPOSIXErrorDomain underlying error — detectable by type, no string matching. (216 lib tests pass.)

Once that is released and this app's StreamingLZMA pin is bumped, the existing typed NSUnderlyingErrorKey/NSPOSIXErrorDomain branch in isOutOfDiskSpace will catch it and this String(describing:)/failureReason fallback becomes redundant (removable). I left the app-side detection in place for now because this PR still depends on StreamingLZMA 1.1.0 — bumping the pin requires the library release first. Flagging that sequencing for you to decide whether to land #7 with the workaround now and simplify after the StreamingLZMA release.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully resolved in 804c98f. StreamingLZMA 1.2.0 is released (RISCfuture/StreamingLZMA@1.2.0) — it throws LZMAError.ioFailure(operation:code:) and bridges via CustomNSError to an NSError with an NSPOSIXErrorDomain underlying error. The pin is bumped 1.1.0 → 1.2.0; the existing typed NSUnderlyingErrorKey branch in isOutOfDiskSpace now catches ENOSPC directly, so the string-matching fallback (and its tests + the test-plan scheme wiring) are removed and isOutOfDiskSpace is back to private. App builds against 1.2.0; full suite (307 tests) green, 0 failures. No string-based error detection remains.

/// container's main context, so the view model never has to touch the main
/// context to decide whether to show the loader.
@MainActor
struct NavDataStateHelperTests {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem too specific to internal workings and should be removed (and the private-ness of the struct restored). Unit tests should be black box (inputs and outputs only). This could be tested with a higher-level unit or UI test, or if that's not reasonable, left untested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed NavDataStateHelperTests.swift and restored NavDataStateHelper to private (ef1fe35). The off-main guarantee cannot be exercised black-box without real store-coordinator contention during a multi-minute import, so per your note it is left untested rather than adding a contrived higher-level test. The SF50 TOLDTests target / Unit Tests plan wiring stays (still used by TerrainDataLoaderErrorTests).

RISCfuture and others added 4 commits May 17, 2026 15:20
- NavDataLoader.writeCycles: stop swallowing the save error with try?;
  it now throws and propagates through load(), so a failed Cycle save
  surfaces (and is captured by the existing Sentry catch in
  NavDataLoaderViewModel) instead of silently corrupting cycle state.
  Consistent with the sibling clearCycles().
- Remove SF50 TOLDTests/NavDataStateHelperTests.swift (white-box tests
  of an internal helper) and restore NavDataStateHelper to private.
  Per review: unit tests should be black box; the off-main guarantee
  is not reasonably black-box testable, so it is left untested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
StreamingLZMA 1.2.0 throws LZMAError.ioFailure carrying the raw errno
and bridges it (via CustomNSError) to an NSError with an
NSPOSIXErrorDomain underlying error. The existing typed
NSUnderlyingErrorKey branch in isOutOfDiskSpace now detects ENOSPC
directly, so the String(describing:)/failureReason fallback is no
longer needed.

- Bump StreamingLZMA pin 1.1.0 -> 1.2.0
- isOutOfDiskSpace: remove the string fallback, restore to private
- Remove TerrainDataLoaderErrorTests (covered the removed workaround;
  the remaining typed detection is trivial/obvious composition over
  Foundation + StreamingLZMA behavior)
- Revert the SF50 TOLDTests test-plan wiring added for those tests
  (scheme now matches main)

Fully resolves the StreamingLZMA review comment on #7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testNOTAMClearMultipleResetsBadge failed on iPhone 16 Pro / iOS 18.4
("Failed to determine hittability of Water/Slush Button: Activation
point invalid"): selectContamination tapped the picker menu option
before the menu finished presenting, so the option's frame was still
invalid and both isHittable and the coordinate fallback failed.

Add BasePage.waitForHittable(_:timeout:) and wait for the option to be
hittable before tapping. Fails with a clear assertion if the menu
never presents, rather than a cryptic hittability error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testScenarioFieldsPersistAfterNavigation failed intermittently on
iPad Pro 13 (M5)/iOS latest ("OAT delta should persist, got: +1"):
clearAndType occasionally dropped a keystroke on slower configs, so
the MeasurementField (a FormatStyle TextField that commits on
editing-end) persisted +1 instead of +15.

Add a verifying clear/type for the numeric scenario fields: end
editing to force the commit, read the field back, and retry if the
shown digits don't reflect the input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@RISCfuture RISCfuture merged commit 7ed1146 into main May 18, 2026
10 checks passed
@RISCfuture RISCfuture deleted the fix/sentry-20-and-29 branch May 18, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant