Skip to content

fix: allow app to start when read replica is unreachable#3136

Open
danoswaltCL wants to merge 4 commits into
release/6.4from
bugfix/read-replica-config
Open

fix: allow app to start when read replica is unreachable#3136
danoswaltCL wants to merge 4 commits into
release/6.4from
bugfix/read-replica-config

Conversation

@danoswaltCL
Copy link
Copy Markdown
Collaborator

when read-replica env var is not empty and has a bad value and can't connect, the app doesn't start. this has caused issues in staging with name changes; this should be a rare occurrence but it could still definitely happen in prod some day, so we can allow the app to start and log in background. any calls that would normally use read-replica would fail (instead of re-routing to main db, which may be undesirable).

- Replace Promise.all with Promise.allSettled in typeormLoader so a
  replica connection failure no longer prevents the app from booting.
  Main DB failure still throws (critical). Replica failure is logged
  and the app continues; affected analytics/export requests will error
  at query time rather than at startup.
- Resolve two TODO comments by wiring up UpgradeLogger (winstonLoader
  runs before typeormLoader so Winston is always ready).
- Add 22 unit tests covering all key scenarios: both succeed, replica
  only fails, main only fails (DB_UNREACHABLE), migration fails
  (MIGRATION_ERROR), both fail, and edge cases (undefined settings,
  synchronize mode, ECS mode).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the backend TypeORM bootstrap logic so a failure to connect to the read-replica/export DataSource no longer prevents the application from starting, and adds unit tests to validate the new startup behavior.

Changes:

  • Make replica/export DataSource initialization non-fatal by logging failures instead of aborting boot.
  • Add structured error logging in typeormLoader for DB startup failures.
  • Add unit tests covering main DB failures, replica failures, and migration failure classification.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/backend/src/loaders/typeormLoader.ts Switches replica init handling to be optional (log on failure) and adds loader-level logging/error classification.
packages/backend/test/unit/loaders/typeormLoader.test.ts Adds focused unit tests for startup behavior across main DB vs replica failures and migration errors.
Comments suppressed due to low confidence (1)

packages/backend/src/loaders/typeormLoader.ts:115

  • The catch block logs { message: 'Database connection failed' } for all failures, including migration errors (code === '42P07') and any other runtime errors thrown in the loader. This makes the log message misleading and harder to triage. Consider logging a message that matches the classified error (e.g., "Database migration failed" for 42P07) or a more general "Database initialization failed".
    const error = err as any;
    log.error({ message: 'Database connection failed', error });
    if (error.code === 'ECONNREFUSED') {
      error.type = SERVER_ERROR.DB_UNREACHABLE;
      throw error;
    } else if (error.code === '42P07') {
      error.type = SERVER_ERROR.MIGRATION_ERROR;
      throw error;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/backend/src/loaders/typeormLoader.ts
Comment on lines +77 to +93
const [mainResult, replicaResult] = await Promise.allSettled([
appDataSourceInstance.initialize(),
exportDataSourceInstance.initialize(),
]);

// Main DB is required — rethrow so the outer catch can classify and re-throw.
if (mainResult.status === 'rejected') {
throw mainResult.reason;
}

// Read replica is optional — log the failure but let the app start.
// Requests that attempt to use the replica connection will receive an error
// at query time instead of preventing the whole app from booting.
if (replicaResult.status === 'rejected') {
const replicaErr = replicaResult.reason as any;
log.error({ message: 'Read replica connection failed — continuing without replica', error: replicaErr });
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcb37 i think this suggestion makes sense, but I'm not sure how to actually test it, or if it is worth fussing over to try to test. worst case scenario is a slight delay in startup. what do you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants