Skip to content

KEP-6011: CSI ControllerGetNodeInfo#6016

Open
huww98 wants to merge 5 commits intokubernetes:masterfrom
huww98:controller-get-node-info
Open

KEP-6011: CSI ControllerGetNodeInfo#6016
huww98 wants to merge 5 commits intokubernetes:masterfrom
huww98:controller-get-node-info

Conversation

@huww98
Copy link
Copy Markdown
Contributor

@huww98 huww98 commented Apr 15, 2026

  • One-line PR description: adding new KEP-6011: CSI ControllerGetNodeInfo
  • Other comments:

huww98 and others added 3 commits April 14, 2026 23:53
…ller-side node info retrieval

This KEP proposes two new CSI RPCs to enable node topology and capacity
information retrieval from the controller side, eliminating the need for
cloud API credentials on worker nodes:

- NodeGetID: Returns only the node identifier without cloud API access
- ControllerGetNodeInfo: Fetches topology and capacity from controller side

Benefits:
- Security: Node components no longer require cloud API credentials
- Scalability: Controller can aggregate and cache API calls for large clusters
- Accuracy: Controller-side VolumeAttachment info enables accurate non-CSI attachment detection

Depends on CSI spec PR kubernetes#603: container-storage-interface/spec#603
Signed-off-by: Eddie Torres <torredil@amazon.com>
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Apr 15, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huww98
Once this PR has been reviewed and has the lgtm label, please assign jsafrane, soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 15, 2026
@huww98 huww98 mentioned this pull request Apr 15, 2026
4 tasks

#### Race Condition Mitigation

A race exists between `ControllerGetNodeInfo` and concurrent attach/detach: if an attach completes between listing `VolumeAttachment` objects and the cloud API query, the newly attached volume appears in cloud results but not in `published_volume_ids`, causing the SP to misclassify it as non-CSI.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering an alternative approach. How about let SP return a list of published_volume_ids, and let CO (external-attacher) to calculate the final volume count available to CSI.

type nodeInfoProcessor struct {
    pendingNodes sync.Map // nodeName -> Set[string]
}

func (p *nodeInfoProcessor) processNode(nodeName string) {
    csiPublished := listVolumeAttachments(nodeName)

    p.pendingNodes.Store(nodeName, csiPublished)
    defer p.pendingNodes.Delete(nodeName)

    info := ControllerGetNodeInfo(nodeID)
    nonCsi := info.publishedVolumeIDs.Difference(csiPublished)
    info.maxVolumesPerNode -= len(nonCsi)
    updateCSINode(csiNode, info)
}

func (h *csiHandler) syncAttach(va) {
    h.nodeInfoProcessor.recordPublish(va)
}

We mitigate the race condition by recording all processed volumeIDs during the ControllerGetNodeInfo call, and also consider them as CSI-managed. We don't need to pause the attach this way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like this a lot better. Classification is naturally the CO's job since it has the VA context and having the SP just report what the cloud says is attached is cleaner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the KEP and CSI spec PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants