Skip to content

DAOS-18899 control: Prevent duplicate rank entries for same engine#18127

Open
tanabarr wants to merge 12 commits into
masterfrom
tanabarr/control-join-no-dupes
Open

DAOS-18899 control: Prevent duplicate rank entries for same engine#18127
tanabarr wants to merge 12 commits into
masterfrom
tanabarr/control-join-no-dupes

Conversation

@tanabarr

@tanabarr tanabarr commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Prevent a rank from joining the DAOS system if all management service
database fields or the joining rank match apart from the UUID. This
avoids the situation where we have multiple ranks matching the same
engine specification. If a rank needs to be added back to an engine
that matches entries in the database then it needs to be formatted
with the dmg storage format --replace command.

Features: control

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@tanabarr tanabarr self-assigned this Apr 28, 2026
@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

@github-actions

github-actions Bot commented Apr 28, 2026

Copy link
Copy Markdown

Ticket title is 'Prevent multiple rank entries for the same engine'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-18899

@kjacque kjacque left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty straightforward. Nice job.

Comment thread src/control/system/membership.go Outdated
}

if len(missing) != 0 {
if !memberFieldsMatch(cm, req) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the helper function could return the missing values as well, to avoid having to check twice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/control/system/membership_test.go Outdated
Comment on lines +1068 to +1069
// Fixed DAOS-15947: Now refused as duplicate addresses/URIs with different UUID
// requiring --replace option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor - I'd pull out the ticket number reference at this point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tanabarr added 3 commits June 10, 2026 12:25
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr changed the base branch from master to tanabarr/control-fmtreplace-rank-erase June 15, 2026 13:08
@tanabarr tanabarr requested a review from kjacque June 15, 2026 13:09
@daosbuild3

Copy link
Copy Markdown
Collaborator

@daosbuild3

Copy link
Copy Markdown
Collaborator

tanabarr added 2 commits June 16, 2026 14:07
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…treplace-rank-erase

Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr force-pushed the tanabarr/control-join-no-dupes branch from 6428f4f to df1b32e Compare June 16, 2026 13:17
@tanabarr tanabarr requested review from knard38 and mjmac June 16, 2026 13:19
@tanabarr tanabarr marked this pull request as ready for review June 16, 2026 13:19
@tanabarr tanabarr requested review from a team as code owners June 16, 2026 13:19
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr force-pushed the tanabarr/control-join-no-dupes branch 2 times, most recently from a5ff888 to 3a802ac Compare June 16, 2026 19:51
tanabarr added 4 commits June 18, 2026 12:35
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
…treplace-rank-erase

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr force-pushed the tanabarr/control-join-no-dupes branch from 3a802ac to acd0383 Compare June 18, 2026 12:59
mjmac
mjmac previously approved these changes Jun 18, 2026
knard38
knard38 previously approved these changes Jun 23, 2026

@knard38 knard38 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking from my side, just a minor fix which could be done in a follow-up PR if the author considers it as relevant.

if matched, _ := memberFieldsMatch(cm, req); !matched {
continue // Fields don't match, skip
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a matching member is in AdminExcluded state, the operator is directed to use --replace, but joinReplace will also fail with ErrJoinAdminExcluded. Would it make sense to branch on the state and return a dedicated fault with the correct two-step resolution?

if cm.State == MemberStateAdminExcluded {
    return FaultJoinMemberExistsAdminExcluded(req.UUID, cm.UUID)
}
return FaultJoinMemberExists(req.UUID, cm.UUID)
func FaultJoinMemberExistsAdminExcluded(newUUID, dbUUID uuid.UUID) *fault.Fault {
    return systemFault(
        code.SystemJoinMemberExists,
        fmt.Sprintf("... already exists with UUID %s and is admin-excluded", dbUUID),
        "clear the exclusion first with 'dmg system exclude --clear <rank>', "+
            "then reformat with 'dmg storage format --replace'",
    )
}

No issue to keep as-is if this scenario is considered out-of-scope.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion I've added the change to #18142

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18127/7/testReport/

Base automatically changed from tanabarr/control-fmtreplace-rank-erase to master June 26, 2026 17:59
@daltonbohning daltonbohning dismissed stale reviews from knard38 and mjmac June 26, 2026 17:59

The base branch was changed.

@daltonbohning daltonbohning requested a review from a team as a code owner June 26, 2026 17:59
…in-no-dupes

Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested review from knard38 and mjmac June 26, 2026 19:24
@tanabarr

Copy link
Copy Markdown
Contributor Author

annoyingly it cleared the reviews even though I only merged master, apologies @knard38 @mjmac could I please have re-reviews, nothing changed

…in-no-dupes

Features: control
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
@tanabarr tanabarr requested a review from a team June 30, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants