resctrl-mon: add NRI plugin for per-pod resctrl monitoring groups#666
Open
cmcantalupo wants to merge 1 commit into
Open
resctrl-mon: add NRI plugin for per-pod resctrl monitoring groups#666cmcantalupo wants to merge 1 commit into
cmcantalupo wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new standalone NRI plugin (nri-resctrl-mon) to create per-pod resctrl monitoring groups (mon_groups) and optionally persist .begin/.end counter snapshots under a host directory for passive AET/Kepler-style consumers.
Changes:
- Introduces the
nri-resctrl-monplugin implementation (resctrl ops, lifecycle hooks, in-memory pod/container tracking, snapshot store) plus unit tests. - Adds a Helm chart, sample configuration, and documentation under a new “monitoring” docs category.
- Wires the plugin into the repository build via the top-level
Makefile.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sample-configs/nri-resctrl-mon.yaml | Adds example YAML configuration for the new plugin. |
| Makefile | Registers nri-resctrl-mon in the build plugin list. |
| docs/monitoring/resctrl-mon.md | New end-user/developer documentation for resctrl-mon behavior and snapshots. |
| docs/monitoring/index.md | Adds a new “monitoring” docs section index. |
| docs/index.md | Links the new monitoring docs section from the main docs index. |
| docs/deployment/helm/resctrl-mon.md | Includes the Helm chart README into the docs site. |
| docs/deployment/helm/index.md | Adds resctrl-mon to the Helm deployment docs index. |
| deployment/helm/resctrl-mon/values.yaml | Default Helm values for the resctrl-mon DaemonSet/chart. |
| deployment/helm/resctrl-mon/values.schema.json | Helm values schema for chart parameter validation. |
| deployment/helm/resctrl-mon/templates/daemonset.yaml | DaemonSet template for running the plugin with required mounts/capabilities. |
| deployment/helm/resctrl-mon/templates/configmap.yaml | ConfigMap template for the plugin configuration file. |
| deployment/helm/resctrl-mon/templates/_helpers.tpl | Shared Helm template helpers (labels/selectors). |
| deployment/helm/resctrl-mon/README.md | Helm chart usage and configuration documentation. |
| deployment/helm/resctrl-mon/Chart.yaml | Helm chart metadata. |
| deployment/helm/resctrl-mon/.helmignore | Helm packaging ignore rules. |
| cmd/plugins/resctrl-mon/state.go | In-memory tracking for per-pod mon_group and container membership. |
| cmd/plugins/resctrl-mon/snapshot.go | Snapshot store implementation (.begin/.end, symlinks, pruning). |
| cmd/plugins/resctrl-mon/snapshot_test.go | Unit tests for snapshot store behavior and pruning. |
| cmd/plugins/resctrl-mon/resctrl.go | Resctrl filesystem operations (create/remove mon_groups, write tasks, read mon_data, cleanup). |
| cmd/plugins/resctrl-mon/resctrl_test.go | Unit tests for resctrl operations and safety validation. |
| cmd/plugins/resctrl-mon/plugin.go | Core NRI hook implementation and configuration parsing/validation. |
| cmd/plugins/resctrl-mon/plugin_test.go | Unit tests for lifecycle behavior, filtering, and snapshot integration. |
| cmd/plugins/resctrl-mon/main.go | Plugin entrypoint, flags, and NRI stub wiring. |
| cmd/plugins/resctrl-mon/Dockerfile | Container image build for the new plugin. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e44434b to
d1171cd
Compare
Add nri-resctrl-mon, a standalone NRI plugin that creates per-pod resctrl monitoring groups (mon_groups) to support passive monitorning of Application Energy Telemetry (AET). The plugin uses the PostCreateContainer hook to assign container PIDs to mon_groups before exec/fork, eliminating the fork race that plagues userspace daemon approaches. RMID allocation is delegated to the kernel via mkdir/rmdir on the resctrl filesystem. Includes: - Plugin source (main.go, plugin.go, resctrl.go, state.go) - Unit tests (plugin_test.go, resctrl_test.go) - Dockerfile following nri-memory-qos pattern - Helm chart (Chart.yaml, values.yaml, templates/, schema) - Documentation (monitoring category, plugin docs, Helm docs) - Sample configuration Signed-off-by: Christopher M. Cantalupo <christopher.m.cantalupo@intel.com> Signed-off-by: Jedrzej Wasiukiewicz <jedrzej.wasiukiewicz@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add nri-resctrl-mon, a standalone NRI plugin that creates per-pod resctrl monitoring groups (
mon_groups) to support passive monitoring of Application Energy Telemetry (AET) via consumers such as Kepler.Motivation
Userspace daemon approaches to resctrl mon_group management suffer from a fork race: a container's first threads can execute before the daemon writes their PIDs into the mon_group's
tasksfile, causing energy attribution gaps. By using the NRIStartContainerhook, this plugin assigns the container's init PID to a mon_group beforeexecruns and threads fork, eliminating the race entirely.What's included
main.go,plugin.go,resctrl.go,state.go)plugin_test.go,resctrl_test.go)nri-memory-qospatternChart.yaml,values.yaml, templates, JSON schema)Design decisions
PostCreateContainercreates the mon_group directory (assigns the RMID), andStartContainerwrites the init PID while the process is paused — before any user threads fork.PostStartContaineris a fallback in case the PID is not available atStartContainer(should not happen on containerd ≥ 2.x).mkdir/rmdiron resctrl delegates RMID lifecycle to the kernel, avoiding userspace exhaustion bugs.Synchronizere-creates mon_groups for running pods on plugin restart and removes orphaned mon_groups left by a previous instance.SYS_ADMIN+DAC_OVERRIDEonly (noprivileged: true).hostPID: trueis required so the plugin can write host-namespace PIDs to the resctrltasksfile.Testing
Stats