Skip to content

feat(react-native): Android crash/ANR symbolication, bundle id, and native stack delivery#116

Open
yduartep wants to merge 17 commits into
mainfrom
feat/android-crash-symbols-and-bundle-id
Open

feat(react-native): Android crash/ANR symbolication, bundle id, and native stack delivery#116
yduartep wants to merge 17 commits into
mainfrom
feat/android-crash-symbols-and-bundle-id

Conversation

@yduartep

@yduartep yduartep commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Android release builds can upload R8 mapping.txt and native symbols, but the SDK must send the same bundle id at runtime and ship native Java/Kotlin stack text in a shape the collector can retrace. This PR wires bundle identity, Gradle upload, previous-session crash replay, and ANR reporting end to end.

Runtime bundle id (appBuildIdentity)

  • Derives meta.app.bundleId as {applicationId}@{versionCode}@{versionName} from react-native-device-info.
  • Metro preamble (__faroBundleId_<appName>) and explicit config.app.bundleId take precedence.
  • Plumbed through initializeFaromakeRNConfig.

Gradle composed source map upload (android/build.gradle)

  • Reads bundle id from app/build/faro/bundle-id-release.txt written by faroWriteBundleIdRelease.
  • Validates applicationId@versionCode@versionName before upload.
  • Uses FARO_SOURCEMAP_* env vars, with sourcemaps.config.json as fallback for endpoint/appId/stackId/apiKey.
  • Upload task depends on faroWriteBundleIdRelease when the Faro Gradle plugin is applied.

Previous-session native crashes

Problem: After a fatal crash, ApplicationExitInfo on the next launch often has no traceInputStream (seen on emulators and Android 16). Without a stack, ingest cannot R8-retrace and the plugin shows a generic crash with no useful title.

FaroUncaughtExceptionHandler — installs as the default Thread.UncaughtExceptionHandler. On an uncaught exception it serializes Log.getStackTraceString(throwable) (exception class, message, and at … frames) and writes it to disk before the process is killed, then forwards to the previous handler so crash behaviour is unchanged.

FaroCrashTraceCache — stores that serialized trace plus a timestamp in SharedPreferences using commit() (synchronous) so the write survives immediate process death. On the next launch, FaroCrashReporter matches the cached trace to the matching ApplicationExitInfo row (±10s) when the OS trace is missing or lacks an exception header, then clears the cache after use.

FaroCrashReporter — reads getHistoricalProcessExitReasons for crash exits (not ANRs), builds JSON for CrashReportingInstrumentation, merges OS trace + cached trace, and drops rows that are ANR timeouts or already covered by ANR detection (avoids a second generic crash line for the same incident).

parseAndroidCrashTrace — parses ApplicationExitInfo / Log.getStackTraceString text into:

  • native frames with module (R8 retrace input), and
  • embedded Hermes/Metro frames (JS source-map input when a fatal JS error includes both).

crashErrorMessage — builds the error title from parsed trace, exception class, or ApplicationExitInfo description; skips rows with no stack and no useful title.

CrashReportingInstrumentation — sends type: 'crash' with parsed stackFrames, nativeExceptionType in context, and error.stack cleared so the JS reporter stack does not override native frames.

Previous-session ANRs

Problem: A live ANR watchdog snapshot can capture the wrong thread (crash-killer frames) or be lost if the process dies before the React bridge sends telemetry.

FaroAnrReporter — on API 30+, reads ApplicationExitInfo with REASON_ANR, pulls traceInputStream, and exports one JSON report per new ANR (deduped by timestamp). Exposed to JS as getHistoricalAnrReports.

AnrThreadDumpParser — extracts the "main" thread at … frames from the multi-thread OS dump (the blocked UI thread, not a background watchdog thread).

FaroAnrCache — when ANRTracker fires, persists the ANR JSON with commit() before the process can be killed; acknowledgeANRs removes entries after Faro sends them; detection timestamps prevent FaroCrashReporter from emitting a duplicate crash for the same ANR.

ANRInstrumentation — on init: drain getHistoricalAnrReports first, then pending tracker cache as fallback; on API 30+ defer live onANRDetected to the next-launch AppExitInfo path; dedupe by timestamp window.

anrStack — filters tracker stacks that contain Android/Faro crash-handler frames (handleApplicationCrash, KillApplicationHandler, FaroUncaughtExceptionHandler) instead of the blocked app code.

Tests

parseAndroidCrashTrace.test.ts, crashErrorMessage.test.ts, appBuildIdentity.test.ts, initialize.test.ts, anrStack.test.ts.

Dependencies

Requires faro-android-gradle-plugin for bundle-id-release.txt and symbol upload, collector Android retrace (androidretrace), and a release app with mapping.txt uploaded.

Verification

cd packages/react-native
yarn quality:test
yarn quality:lint
yarn build

Release build with symbols uploaded: trigger native crash or ANR, relaunch app. Crash rows should show retraced Kotlin/Java frames and a concrete exception title. ANR rows should show type: ANR with retraced main-thread frames and no extra generic crash row for the same trigger.

Screenshots

  1. FE O11y Errors — ANR detail with deobfuscated *.kt file:line frames.
RN - ANR
  1. FE O11y Errors — list with a single ANR row (no paired generic crash).
RN Table Errors
  1. Native RuntimeException with R8-retraced Kotlin frames.
RN - NullPointerException
  1. Native Crash:
RN Native Crash
  1. Handled Exception
RN Handled Exception
  1. Unhandled Exception
RN Unhadled Exception

@yduartep yduartep requested a review from a team as a code owner June 5, 2026 19:10
Comment thread packages/react-native/src/instrumentations/crashReporting/instrumentation.ts Outdated
Comment thread packages/react-native/android/build.gradle
Comment thread packages/react-native/src/instrumentations/crashReporting/crashErrorMessage.ts Outdated
* Human-readable exception value for ApplicationExitInfo crash reports.
* Priority: parsed trace message → system description → exception class → Flutter-style fallback.
*/
export function resolveCrashErrorMessage(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this android specific crash-error-message logic?
If yes, did we guard it so only applies to Android?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! This logic is Android-specific and is only invoked from Android native modules. The CrashReportingInstrumentation class is only instantiated on Android (via FaroCrashReporter.getCrashReports() in native Java code). iOS uses a different crash reporting path. The functions are in shared TS files for easier testing, but the entry point is platform-gated at the native module level.

private sendCrashReport(crash: CrashReport): void {
const errorMessage = this.getErrorMessage(crash);
const parsedTrace = crash.trace
? parseAndroidCrashTrace(crash.trace, {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, it looks like we also call this for iOS. Or maybe I am missing something

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to check! The parseAndroidCrashTrace function is Android-specific and is only called from the Android code path. The CrashReportingInstrumentation is only instantiated when FaroCrashReporter.getCrashReports() (Android native module) returns crash data. iOS doesn't call into this instrumentation at all - it uses a different crash reporting mechanism. The Android-specific parsing stays in shared TS for testing convenience, but the native module layer acts as the platform guard.


if (!stackFrames.length && crash.trace) {
this.logWarn(
'Native crash trace present but no frames parsed; check Android trace format or UncaughtExceptionHandler cache'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here, looks like we assume this is only going to run for Android

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct observation! This warning message is Android-specific because it references Android's ApplicationExitInfo traceInputStream and the UncaughtExceptionHandler cache. This code path is only reached when processing Android crash reports from FaroCrashReporter.getCrashReports() (Android native module). iOS doesn't call into this instrumentation - it has a separate crash reporting implementation. The platform guard happens at the native module level where getCrashReports() only exists on Android.

@yduartep yduartep changed the title feat(react-native): Android crash stack parsing, symbols bundle id, and Gradle bundle-id upload eat(react-native): Android crash/ANR symbolication, bundle id, and native stack delivery Jun 11, 2026
@yduartep yduartep changed the title eat(react-native): Android crash/ANR symbolication, bundle id, and native stack delivery feat(react-native): Android crash/ANR symbolication, bundle id, and native stack delivery Jun 11, 2026
yduartep and others added 8 commits June 11, 2026 19:00
…eshold

When a cached crash trace is older than 10s relative to the ApplicationExitInfo
timestamp, it's now automatically cleared from SharedPreferences instead of
persisting indefinitely. This prevents stale crash data from accumulating
across multiple app sessions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Verifies that parseAndroidCrashTrace captures the deepest 'Caused by:'
exception type and message when parsing wrapped exception chains. This
documents the intentional behavior: we prefer the root cause over the
outer wrapper exception for more actionable error titles in the UI.

Co-authored-by: Cursor <cursoragent@cursor.com>
Changed warning from suggesting to 'install UncaughtExceptionHandler cache'
to accurately reflect that the trace was unavailable at crash time. The
handler is already installed automatically by the SDK.

Co-authored-by: Cursor <cursoragent@cursor.com>
Changed visibility from public to internal as this is an implementation
detail that should not be part of the public API. Matches the visibility
pattern used by FaroCrashTraceCache.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ignore entry

Added inline comment in build.gradle warning that sourcemaps.config.json
is for local dev only and credentials should use env vars in CI/production.
Also added sourcemaps.config.json to .gitignore to prevent accidental
commit of API keys.

Co-authored-by: Cursor <cursoragent@cursor.com>
…sages

Simplified documentation by removing 'Flutter-style' and 'Flutter-aligned'
references. Changed to 'generic fallback' which is clearer and doesn't
require knowledge of other SDK implementations.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rm folders

Reorganized crash reporting code to make platform-specific logic explicit:

- Created android/ and ios/ subfolders under crashReporting/
- Moved Android-specific files (parseAndroidCrashTrace, crashErrorMessage) to android/
- Implemented iOS crash reporting (parseIosCrashTrace, crashErrorMessage)
- Updated instrumentation.ts to use Platform.OS for platform detection
- Added platform-specific parsing and error message resolution

iOS implementation:
- Parses PLCrashReporter stack format (frame number, library, instruction pointer)
- Maps BSD signals (SIGSEGV, SIGABRT, etc.) to human-readable messages
- Handles uncaught NSException with name and reason
- Includes comprehensive test coverage

This addresses concerns about mixing platform-specific code and prepares
for iOS crash reporting when PLCrashReporter dependency is added.

Co-authored-by: Cursor <cursoragent@cursor.com>
Refactor crash reporting to eliminate conditional platform logic by using
the Strategy Pattern with platform-specific implementations:

- Create abstract BaseCrashReportingInstrumentation extending BaseInstrumentation
- Implement AndroidCrashReportingInstrumentation for Android-specific logic
  (ApplicationExitInfo API, R8 retracing, Java/Kotlin stack parsing)
- Implement IosCrashReportingInstrumentation for iOS-specific logic
  (PLCrashReporter, signal mapping, native frame parsing)
- Export platform-aware CrashReportingInstrumentation via Platform.OS check
- Move platform-specific code to android/ and ios/ subfolders
- Update ANR instrumentation to import from android/ subfolder
- Add comprehensive test coverage for iOS crash error messages
- Add reason/status fields to CrashReport type for iOS compatibility

Benefits:
- Eliminates all if/else Platform.OS checks from instrumentation code
- Provides clear separation between Android and iOS implementations
- Makes it easier to add iOS crash reporting in the future
- Improves code organization and maintainability
- Each platform can evolve independently

Usage remains unchanged - users instantiate CrashReportingInstrumentation
and the correct platform implementation is selected automatically at module load time.

Co-authored-by: Cursor <cursoragent@cursor.com>
@yduartep

Copy link
Copy Markdown
Contributor Author

Platform-Specific Crash Reporting Refactor

@robert-northmind Thank you for the excellent feedback on code organization! I've refactored the crash reporting instrumentation to address your concerns about platform-specific code clarity.

What Changed

Before: Single instrumentation.ts with conditional Platform.OS checks scattered throughout the code, making Android-specific logic (like parseAndroidCrashTrace) appear generic.

After: Clean separation using the Strategy Pattern:

crashReporting/
├── BaseCrashReportingInstrumentation.ts  ← Abstract base class
├── android/
│   └── AndroidCrashReportingInstrumentation.ts
└── ios/
    └── IosCrashReportingInstrumentation.ts

Key Improvements

  1. No more if/else platform checks - Each platform has its own implementation class
  2. Clear file organization - Android-specific code lives in android/, iOS in ios/
  3. Easier to extend - Adding iOS crash reporting just means implementing the abstract methods
  4. Better maintainability - Each platform can evolve independently
  5. Type safety - Platform-specific parsed trace types are properly separated

Implementation Details

  • BaseCrashReportingInstrumentation extends BaseInstrumentation and defines abstract methods for platform-specific behavior
  • AndroidCrashReportingInstrumentation implements Android-specific logic (ApplicationExitInfo, R8 retracing, Java stack parsing)
  • IosCrashReportingInstrumentation implements iOS-specific logic (PLCrashReporter format, signal mapping) with comprehensive tests
  • Main index.ts exports the correct implementation based on Platform.OS at module load time
  • Usage for consumers remains unchanged: new CrashReportingInstrumentation()

Testing

  • ✅ All quality checks pass (lint, tests, build, circular deps)
  • ✅ Added comprehensive test coverage for iOS crash error messages
  • ✅ Tested in React Native demo app - crash reporting still works correctly

This architecture makes it explicit which code is platform-specific and provides a clear path for future iOS crash reporting implementation.

yduartep and others added 2 commits June 11, 2026 20:09
Integrated changes from main into the Android crash symbols feature branch:

- Adopted PreloadedMobileMeta structure for structured mobile meta fields
- Added fatal: true flag to ANR and crash error reports (matching Flutter pattern)
- Added device attribute collection comments for clarity
- Merged DeviceInfo mock methods (getBundleId, getBuildNumber, getApiLevel, getBuildId)
- Preserved all crash reporting refactoring (BaseCrashReportingInstrumentation and platform-specific implementations)

Key changes from main:
- Structured mobile meta fields (device, os, app) replacing flat session attributes
- loadMobileMetaForInit() replacing loadSessionDeviceAttributesForInit()
- Fatal flag on errors for better crash/ANR severity classification

Key changes from feature branch:
- appSymbolsBundleId parameter for server-side symbol retrace
- Platform-specific crash reporting architecture (Android/iOS separation)
- Enhanced crash error message resolution

All conflicts resolved to preserve both sets of functionality.

Co-authored-by: Cursor <cursoragent@cursor.com>
Fixed test failures introduced by the PreloadedMobileMeta structure changes:

1. Updated crash reporting test to import from index instead of non-existent instrumentation file
2. Refactored symbolsBundleIdMeta to resolveAppBundleId helper that:
   - Properly merges bundleId from config, Metro preamble, or DeviceInfo with preloaded app meta
   - Includes Metro preamble bundleId in metas instead of relying on faro-core registration
3. Added app meta to metas array to ensure preloaded installationId appears in faro.metas.value
4. Merged structured mobile meta (device, os) into metas while preserving app meta separately

All tests now passing.

Co-authored-by: Cursor <cursoragent@cursor.com>

@aranhave aranhave left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retested the latest head locally. Release build works, runtime bundle id is coming through and native crash + ANR payloads both include the expected fatal=true signal and source stack. I only left two small questions: one around passing the API key as a CLI arg, and one around repeated ANR detections for the same blocked period. Everything else I checked looks good to me :)

"--endpoint", faroEndpoint,
"--app-id", faroAppId,
"--stack-id", faroStackId,
"--api-key", faroApiKey,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we avoid passing the API key as a CLI arg here? Gradle --info / process listings can expose command-line args. env forwarding would keep the key out of the command line.


// Persist first: survives process kill before JS/network can run.
applicationContext?.let { context ->
FaroAnrCache.savePendingAnr(context, payload)

@aranhave aranhave Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my local run one long ANR replayed as multiple ANR exception payloads plus an anr_count measurement. Should we collapse repeated tracker detections for the same blocked period?

return JSONObject().apply {
put("type", "ANR")
put("timestamp", exitInfo.timestamp)
put("duration", ANRTracker.timeout)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout object seems to be inited to a 5 second value and never updated. Is this what we want?

return null
}

if (FaroAnrCache.hasNearbyAnrDetection(context, exitInfo.timestamp)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a nearby ANR does not mean app crashed for the same reason. Not a blocker, but worth tracking it somewhere.

@robert-northmind robert-northmind left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🤩
I left some minor comments. But all of mine are optional.
I think thought that maybe you should tackle the feedback from Ben and Vishwan. They know Andorid the best

* threshold fires, so a process kill before the React bridge can emit still
* leaves a recoverable record for the next launch.
*/
internal object FaroAnrCache {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class has no unit tests? Should we consider adding some?
Seems like a good way to ensure it works as expected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure though how easy it is to add tests for kotlin code in a RN project.

* Kept separate from [FaroCrashReporter] so ANR timeout rows are not replayed as
* generic previous-session `crash` events (no message / duplicate rows).
*/
object FaroAnrReporter {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think same here. This seems like a pretty testable class. Maybe we would need to break out some dependencies and pass them in. Not sure how to best write/setup tests for Kotlin

* ```
*/
export const CrashReportingInstrumentation =
Platform.OS === 'android' ? AndroidCrashReportingInstrumentation : IosCrashReportingInstrumentation;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ReactNative more or less only run on Android and iOS/iPadOS etc?
If not, should we be a bit careful here any maybe do something like:

if (Platform.OS === 'android') {
    return AndroidCrashReportingInstrumentation;
} else if (Platform.OS === 'ios') {
    // RN reports both iOS and iPadOS under the "ios" name
    return IosCrashReportingInstrumentation;
} else {
     return NoOpCrashReportingInstrumentation;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants