fix(os-migration): harden startup validation, add emergency halt path, fix thread-safe formatter#35632
Conversation
…, fix thread-safe formatter Closes #35631 - IndexStartupValidator: promote OS connectivity failures to hard errors (throws DotRuntimeException); validateIndexingConfig() now returns boolean so callers can react instead of only catching exceptions. - IndexConfigHelper: add MigrationPhase.reset() and haltMigration() for runtime emergency rollback — writes ordinal 0 back to FEATURE_FLAG_OPEN_SEARCH_PHASE without a restart. - ContentletIndexAPIImpl: wire haltMigration() when validateIndexingConfig() returns false; fix hasEmptyIndices() to be phase-aware so an empty OS index in phases 1/2 does not trigger an unintended full ES reindex. - ESIndexResource: replace non-thread-safe SimpleDateFormat (timestampFormatter) with immutable DateTimeFormatter (threadSafeTimestampFormatter) to prevent corrupted index names under concurrent requests. - docs: add OPENSEARCH_MIGRATION_TEST_PLAN.md (530-line test plan covering all 6 migration phases, rollback scenarios, and dual-write validation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @fabrizzio-dotCMS's task in 2m 7s —— View job Review
The shape is good — version + endpoint checks gate boot, halt path is runtime-only so testers can recover, and the formatter swap is correct. A few things look wrong or worth a second look: 1. Phase 3 "abort loudly" doesn't actually abort — the surrounding try/catch swallows it
End state in Phase 3 + OS unreachable today:
That's effectively the same outcome as the non-Phase-3 path, just with a different log line. If the intent is genuinely "do not boot with stale ES," either let the exception escape, narrow the outer catch, or set a flag that disables traffic. Otherwise the Phase 3 branch is documentation, not behavior. 2. The thread-safe formatter swap is incompletePR says
The race in 3. Possible behavior regression in
|
- ESIndexResource: fix LocalTime.now() → LocalDateTime.now() so the yyyyMMddHHmmss formatter no longer throws UnsupportedTemporalTypeException at runtime; drop unused java.util.Date import - IndexStartupValidator: fix stale class Javadoc (remove 'demoted to warning' claim), fix validateOSVersion Javadoc (remove 'JVM exit'), fix log message to say 'halting OS migration / fall back to ES-only' instead of 'cannot start' - IndexConfigHelper.MigrationPhase.reset(): add Javadoc warning that system-table config source shadows Config.setProperty(), making reset() a no-op when the phase flag was set via system-table - ContentletIndexAPIImpl.checkAndInitializeIndex(): add phase-3 guard — refuse to auto-rollback to ES when OS validation fails in PHASE_3_OPENSEARCH_ONLY (ES may be decommissioned/stale); throw DotRuntimeException so operator sees a clear fatal log instead of a silent demotion to a stale ES index - ContentletIndexAPIImpl.hasEmptyIndices(): collapse redundant phase-0 branch (phases 0–2 all fall through to isEsWorkingIndexEmpty()) - ContentletIndexAPIImpl: remove leading spaces from two error/log strings (lines 791, 796) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rdening Adds Scenarios G–K (TC-001–TC-025) covering startup validation, haltMigration() emergency halt, phase-aware index init, thread-safe formatter, multi-node cluster Phase 1, content/CT lifecycle sync, index delete behavior, and REST security. Updates Scenario B-7 to reflect breaking change from PR #35632 (haltMigration() now triggered on OS connectivity failure instead of silent boot). Updates Scenario C-3 similarly. Adds QA Epic and Fix PR links to header. Renumbers sections to accommodate new scenarios (F moved to 15, DB Queries→16, Bugs→17, Execution Order→18, Out of Scope→19). Bugs table gains 3 new rows for PR #35632. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…thub.com/dotCMS/core into fix/os-migration-startup-hardening-35631
…rtupValidator Logger.fatal was halting startup unnecessarily for a non-fatal OS config error — the fallback to ES-only is intentional and should not surface as a fatal-level log entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etIndexAPIImpl Consistent with the same fix in IndexStartupValidator — halting the migration is a controlled fallback, not a fatal condition, so Logger.error is the correct severity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eference The file had been overwritten with a QA test plan by the design-fix-test-plan skill. Restores it to its canonical architecture documentation content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #35631
Summary
DotRuntimeException);validateIndexingConfig()returnsbooleanso callers can react without catching exceptions.MigrationPhase.reset()andhaltMigration()— runtime-only emergency rollback that writes ordinal0back toFEATURE_FLAG_OPEN_SEARCH_PHASEwithout a restart.haltMigration()when startup validation fails;hasEmptyIndices()is now phase-aware (empty OS index in phases 1/2 no longer triggers unintended full ES reindex).SimpleDateFormatwith immutableDateTimeFormatterto prevent corrupted index names under concurrent requests.OPENSEARCH_MIGRATION_TEST_PLAN.md— 530-line test plan covering all migration phases, rollback scenarios, and dual-write validation.Test plan
checkAndInitializeIndex()skips OS validator.validateIndexingConfig()returnsfalse→haltMigration()resets phase to 0 → fatal log emitted → dotCMS falls back to ES-only./api/v1/esindex/create: index names are well-formed (noSimpleDateFormatrace).docs/backend/OPENSEARCH_MIGRATION_TEST_PLAN.mdfor full test matrix.🤖 Generated with Claude Code