Skip to content

Feature: active archive table#3

Open
tinchoz49 wants to merge 12 commits intomainfrom
feature/active-archive-and-time-travel
Open

Feature: active archive table#3
tinchoz49 wants to merge 12 commits intomainfrom
feature/active-archive-and-time-travel

Conversation

@tinchoz49
Copy link
Copy Markdown
Contributor

@tinchoz49 tinchoz49 commented Apr 27, 2026

Summary

This PR implements an Active/Archive split for Duron's PostgreSQL adapter to eliminate hot-path bloat and improve query performance.

Breaking Changes

  • New database schema with active/archive tables
  • New migration required: 20260421153337_large_nitro
  • Spans use a single spans table (no active/archive split)

What's Changed

Core Architecture:

  • Split jobs into jobs_active and jobs_archive
  • Split job_steps into job_steps_active and job_steps_archive
  • Single spans table with no FK constraints to avoid cascade issues
  • Status-based query routing: created/active → active, completed/failed/cancelled → archive

Archive Management:

  • _pruneArchive - scheduled cleanup with advisory locks
  • _truncateArchive - clear all archive data
  • _getArchiveStats - archive metrics
  • REST API endpoints for archive operations

Bug Fixes:

  • Fixed BigInt serialization issue in span inserts
  • Fixed FK constraint ordering during archive operations
  • Added orphan span cleanup in prune
  • Optimized prune to use SQL-native CTEs instead of JS round-trips

Dashboard:

  • "Live Jobs" default tab
  • "Archive" tab with filter toggles
  • "All Jobs" toggle using UNION ALL

Developer Experience:

  • Updated AGENTS.md: do not use git worktrees

Testing

  • 163 tests passing
  • Added test/archive.test.ts with 18 archive-specific tests
  • Added packages/shared-actions/test/process-order.test.ts integration test

- Split jobs/steps into active and archive tables
- Jobs move from active to archive on complete/fail/cancel
- Single spans table without FK constraints
- SQL-native archive operations with USING joins
- Prune with orphan span cleanup
- Time travel restores archived jobs to active
- Dashboard archive page and filter toggles
- Add archive-specific tests
The test was importing from 'duron' package which failed in CI
because workspace resolution doesn't work during test execution.
Moving it to the duron package allows using relative imports.
- Exclude expired active jobs from verify_concurrency CTE count
- _recoverJobs now archives expired jobs as failed (not reset to created)
- Non-expired jobs from unresponsive owners still reset to created
- Add recoverJobsInterval option (default 60s) for periodic recovery
- Recovery runs before fetch when interval has elapsed
- Add test: expired jobs no longer block new job selection
- Update docs: client-api.mdx and multi-worker.mdx
…ecovery test

- Change processTimeout default from 5000ms to 500ms for faster orphan detection
- Update docs (client-api, examples, multi-worker) with new default
- Rename and improve test: 'should recover expired orphan jobs from dead processes'
- Add multiProcessMode + recoverJobsInterval + fake client_id to simulate crash
- Add detailed comment explaining the multi-process orphan recovery scenario
- Keep fetch() recovery only in multiProcessMode to avoid single-process overhead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant