Skip to content

CMR-11155: Version Control Index Set [NOT READY FOR REVIEW]#2454

Draft
jmaeng72 wants to merge 18 commits into
masterfrom
CMR-11155
Draft

CMR-11155: Version Control Index Set [NOT READY FOR REVIEW]#2454
jmaeng72 wants to merge 18 commits into
masterfrom
CMR-11155

Conversation

@jmaeng72

@jmaeng72 jmaeng72 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Overview

What is the objective?

To allow for disaster recovery and keeping track of historical changes of index-set.

What are the changes?

  • Created new index-set table in database as new source of truth
  • Created new /sync-with-database api to be able to recover index-sets from database if we lose the index in elastic
  • Created new /index-sets/:id?revision_id=X to get different revision versions of the index-set for historical reference
  • Index set is treated like a concept now, old revisions are deleted up to 10 versions and there is a new revision created for every create, update, delete. Delete index-set is tombstoned in the db

What areas of the application does this impact?

Indexer

Required Checklist

  • New and existing unit and int tests pass locally and remotely
  • clj-kondo has been run locally and all errors in changed files are corrected
  • I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

@codecov-commenter

codecov-commenter commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 22.34432% with 212 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.33%. Comparing base (0bcadbd) to head (ac4ffa5).

Files with missing lines Patch % Lines
...app/src/cmr/indexer/services/index_set_service.clj 4.95% 115 Missing ⚠️
indexer-app/src/cmr/indexer/data/elasticsearch.clj 0.00% 24 Missing ⚠️
indexer-app/src/cmr/indexer/api/routes.clj 0.00% 17 Missing ⚠️
...-test/src/cmr/system_int_test/utils/index_util.clj 52.17% 11 Missing ⚠️
...adata_db/migrations/093_setup_index_sets_table.clj 47.36% 10 Missing ⚠️
.../migrations/092_update_cmr_subscriptions_table.clj 0.00% 8 Missing ⚠️
...cmr/metadata_db/data/oracle/concepts/index_set.clj 30.00% 7 Missing ⚠️
metadata-db-app/src/config/mdb_migrate_helper.clj 14.28% 6 Missing ⚠️
...-test/src/cmr/system_int_test/utils/url_helper.clj 71.42% 4 Missing ⚠️
...exer-app/src/cmr/indexer/common/index_set_util.clj 0.00% 2 Missing ⚠️
... and 6 more

❗ There is a different number of reports uploaded between BASE (0bcadbd) and HEAD (ac4ffa5). Click for more details.

HEAD has 19 uploads less than BASE
Flag BASE (0bcadbd) HEAD (ac4ffa5)
23 4
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2454       +/-   ##
===========================================
- Coverage   58.20%   29.33%   -28.88%     
===========================================
  Files        1069     1011       -58     
  Lines       74278    71019     -3259     
  Branches     2166     1198      -968     
===========================================
- Hits        43233    20830    -22403     
- Misses      29028    49053    +20025     
+ Partials     2017     1136      -881     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

collection-response (ingest/bulk-update-task-status "PROV1" (:task-id response))
collection-statuses (:collection-statuses collection-response)]
(is (= "COMPLETE" (:task-status collection-response)))
(is (= "Collection with concept-id [C1200000009-PROV1] is deleted. Can not be updated."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we added a new index-set table which is treated like a concept, when these tests are run they were hardcoded to get the concept id as expected with the number of time the concept_seq_id would increment. Because index-set is also using the concept_seq_id to increment its id, the concept_seq_id was increased by one in all these tests.

;; instead.
[cmr.metadata-db.data.oracle.concepts.generic-documents]))
[cmr.metadata-db.data.oracle.concepts.generic-documents]
[cmr.metadata-db.data.oracle.concepts.index-set]))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to allow index set to be seen by metadata db during sys tests later

(println "cmr.metadata-db.migrations.092-update-cmr-sub-notifications-table up...")
(h/sql "ALTER TABLE METADATA_DB.CMR_SUB_NOTIFICATIONS DROP COLUMN AWS_ARN")
(h/sql "ALTER TABLE METADATA_DB.CMR_SUBSCRIPTIONS ADD AWS_ARN VARCHAR(2048) NULL"))
(when (h/column-exists? "CMR_SUB_NOTIFICATIONS" "AWS_ARN")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to add this because my local had a wrong version of AWS_ARN and had to be replaced. This is helpful to any other dev with local db issues. This will not have adverse affects since the logic still remains the same

(def ^:private index-sets-column-sql
"id NUMBER,
concept_id VARCHAR(255) NOT NULL,
native_id VARCHAR(1030) NOT NULL,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native_id is the id of the index-set

(let [reshard-start-resp (bootstrap/start-reshard-index "1_small_collections_100_shards" {:synchronous true :num-shards 50 :elastic-name gran-elastic-name})
task-id (:task-id reshard-start-resp)]
task-id (:task-id reshard-start-resp)
status-check-attempts (range 3)]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because some runs of this may be slow, it could take some time for reshard to finish, so I put 3 attempts here to prevent unnecessary failures on non-immediate completes

:reshard-status "COMPLETE"}
(bootstrap/get-reshard-status "1_small_collections_100_shards" {:elastic-name gran-elastic-name :task-id task-id})))))
(run! (fn [i]
(Thread/sleep 2000) ;; wait for 2 secs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a wait here in case the reshard takes longer than expected

@jmaeng72 jmaeng72 marked this pull request as draft June 24, 2026 19:11
@jmaeng72 jmaeng72 self-assigned this Jun 24, 2026
@jmaeng72 jmaeng72 changed the title CMR-11155: Version Control Index Set CMR-11155: Version Control Index Set [NOT READY FOR REVIEW] Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants