KEP-6011: CSI ControllerGetNodeInfo#6016
Conversation
huww98
commented
Apr 15, 2026
- One-line PR description: adding new KEP-6011: CSI ControllerGetNodeInfo
- Issue link: CSI ControllerGetNodeInfo #6011
- Other comments:
…ller-side node info retrieval This KEP proposes two new CSI RPCs to enable node topology and capacity information retrieval from the controller side, eliminating the need for cloud API credentials on worker nodes: - NodeGetID: Returns only the node identifier without cloud API access - ControllerGetNodeInfo: Fetches topology and capacity from controller side Benefits: - Security: Node components no longer require cloud API credentials - Scalability: Controller can aggregate and cache API calls for large clusters - Accuracy: Controller-side VolumeAttachment info enables accurate non-CSI attachment detection Depends on CSI spec PR kubernetes#603: container-storage-interface/spec#603
Signed-off-by: Eddie Torres <torredil@amazon.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
||
| #### Race Condition Mitigation | ||
|
|
||
| A race exists between `ControllerGetNodeInfo` and concurrent attach/detach: if an attach completes between listing `VolumeAttachment` objects and the cloud API query, the newly attached volume appears in cloud results but not in `published_volume_ids`, causing the SP to misclassify it as non-CSI. |
There was a problem hiding this comment.
I'm considering an alternative approach. How about let SP return a list of published_volume_ids, and let CO (external-attacher) to calculate the final volume count available to CSI.
type nodeInfoProcessor struct {
pendingNodes sync.Map // nodeName -> Set[string]
}
func (p *nodeInfoProcessor) processNode(nodeName string) {
csiPublished := listVolumeAttachments(nodeName)
p.pendingNodes.Store(nodeName, csiPublished)
defer p.pendingNodes.Delete(nodeName)
info := ControllerGetNodeInfo(nodeID)
nonCsi := info.publishedVolumeIDs.Difference(csiPublished)
info.maxVolumesPerNode -= len(nonCsi)
updateCSINode(csiNode, info)
}
func (h *csiHandler) syncAttach(va) {
h.nodeInfoProcessor.recordPublish(va)
}We mitigate the race condition by recording all processed volumeIDs during the ControllerGetNodeInfo call, and also consider them as CSI-managed. We don't need to pause the attach this way.
There was a problem hiding this comment.
Yeah, I like this a lot better. Classification is naturally the CO's job since it has the VA context and having the SP just report what the cloud says is attached is cleaner.
There was a problem hiding this comment.
Updated the KEP and CSI spec PR