KEP-5958: Client Opt-out for managedFields in API Response#6015
KEP-5958: Client Opt-out for managedFields in API Response#6015yongruilin wants to merge 4 commits intokubernetes:masterfrom
Conversation
|
/sig api-machinery |
|
|
||
| `metadata.managedFields` is used by the API server for Server-Side Apply (SSA) conflict resolution. However, the vast majority of Kubernetes clients do not actively process this data. Many core components, such as `kube-controller-manager` and `kube-scheduler`, currently use client-side transforms to drop managed fields to save memory. | ||
|
|
||
| Relying on client-side transforms still incurs significant system-wide costs: |
There was a problem hiding this comment.
Let's be concrete in numbers.
See kubernetes/kubernetes#134375 (comment) and https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/555-server-side-apply#scalability
Objects applied using server side apply will have their managed fields metadata populated. managedFields metadata fields can represent up to 60% of the total size of an object, increasing the size of objects.
There was a problem hiding this comment.
Updated, referred to SSA kep.
| - **Opting out of fields in the request body of write operations** (this KEP only applies to dropping fields in API responses). | ||
|
|
||
| ## Proposal | ||
|
|
There was a problem hiding this comment.
Add a short overview of the proposal.
| ### Goals | ||
|
|
||
| - Provide a mechanism for clients to opt-out of receiving `metadata.managedFields` in API responses (GET, LIST, WATCH, PUT, POST, and PATCH). | ||
| - Reduce API Server CPU usage for serialization. |
There was a problem hiding this comment.
By how much? A lot of things can reduce the CPU, why doing this will be a good decision?
There was a problem hiding this comment.
Updated. Not sure I should put the concrete number here, should I put it as "up to 60%" or refer to the PoC's payload size as "saving ~36%" ?
|
|
||
| - Provide a mechanism for clients to opt-out of receiving `metadata.managedFields` in API responses (GET, LIST, WATCH, PUT, POST, and PATCH). | ||
| - Reduce API Server CPU usage for serialization. | ||
| - Reduce network traffic between API Server and clients. |
|
|
||
| ### Goals | ||
|
|
||
| - Provide a mechanism for clients to opt-out of receiving `metadata.managedFields` in API responses (GET, LIST, WATCH, PUT, POST, and PATCH). |
There was a problem hiding this comment.
This is not a goal per se. That's the proposal.
| - Provide a mechanism for clients to opt-out of receiving `metadata.managedFields` in API responses (GET, LIST, WATCH, PUT, POST, and PATCH). | ||
| - Reduce API Server CPU usage for serialization. | ||
| - Reduce network traffic between API Server and clients. | ||
| - Reduce client-side memory allocations and GC overhead. |
| ### Non-Goals | ||
|
|
||
| - **General-purpose field selection or opting out of other fields** (though the API design is intended to be extensible to support this in the future without a redesign). | ||
| - **Opting out of fields in the request body of write operations** (this KEP only applies to dropping fields in API responses). |
There was a problem hiding this comment.
Question is how it works currently? KCM and scheduler are already droping managedFields, so they don't have anything to send. For PATCH request API server would merge with local state, so those clients cannot do PUT request, or is it always merged by API server.
There was a problem hiding this comment.
Added one more sentence to describe the current status.
| 1. **API Server Encoder:** Extend the API server's encoders to support a flag for excluding `managedFields`. When this flag is set, the encoder will skip the `managedFields` field during serialization. | ||
| 2. **Watch Cache (Cacher):** The `cachingObject` in the watch cache will be updated to include the exclusion flag in its serialization cache key. This ensures that mixed opt-in and opt-out watchers correctly receive their respective serialized forms. |
There was a problem hiding this comment.
To what you are proposing, don't think there are any changes needed in Watch Cache. This can be implemented as just additional mode of Serializer can be selected selected during negotiation content type negotiation and has a separate Identifier.
Watch cache caching object just will treat new Identifier as separate format. This has downside that until all clients migrate to drop managedfields (estimated ETA: never) we will need to serialize twice each event. Would be good to estimate the overhead but, I expect it should be acceptable.
Please implement a PoC and use it to validate if the proposal implementation is aligning.
There was a problem hiding this comment.
PoC has been updated. kubernetes/kubernetes#138105
|
|
||
| To ensure consistent behavior and support clients running against older API servers: | ||
|
|
||
| - **client-go Modification:** `client-go` will be modified to support field dropping by adding a configuration option (e.g., in `rest.Config`) to request dropping specific fields. When enabled, it will automatically add the `drop=metadata.managedFields` parameter to the `Accept` header. |
There was a problem hiding this comment.
Per-client config might be sufficient for our expected use cases of drop=metadata.managedFields, but, in general for field dropping, I'm not convinced per-client offers sufficient configuration. I can imagine the need for per-request configuration.
Let's explain our rationale on this topic in the KEP somewhere. I can see a few possible options:
- We offer per-client config and per-request config, where per-request overrides the client config
- We offer per-client config for now and we explain in this KEP of how we could offer per-request config as future work (it should be compatible)
- We explain why we think per-request config should never be needed (I find this hard to believe, but I'm willing to hear an argument for it)
There was a problem hiding this comment.
Added a section to discuss per-client vs per-request.
| - **Network Overhead:** Large `managedFields` payloads consume significant network bandwidth and increase transfer time during LIST and WATCH operations. This can lead to request timeouts and prevents the API server from efficiently handling large resources. | ||
| - **Client-side GC:** Clients must allocate structural objects (string headers, maps, and slice backing arrays) for `managedFields` before discarding them. | ||
|
|
||
| ### Goals |
There was a problem hiding this comment.
Let's include our in-tree controller migration plan here too. Are we planning to migrate the kube-controller-manager and kube-scheduler? I think we should.
There was a problem hiding this comment.
Added the section for in-tree controller migration, PTAL.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yongruilin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
||
| ### Implementation Details | ||
|
|
||
| 1. **API Server Serializer:** Add an `ExcludeManagedFields` option to the JSON and CBOR serializers. When this option is set, the serializer skips `metadata.managedFields` during encoding and exposes this variant as a distinct codec on `runtime.SerializerInfo` with its own `Identifier()`. The content type negotiation layer selects the appropriate serializer based on the `drop` parameter in the `Accept` header. |
There was a problem hiding this comment.
Oh good catch. The main beneficiaries of this will be kube-controller-manager and kube-scheduler, both of which use protobof, so this is essential.
There was a problem hiding this comment.
Right, also CBOR should be out of scope.
metadata.managedFields in API responses to save bandwidth and CPU.
watch cache trade-offs and defensive API design.