Skip to content

balloons: add cpuClasses#667

Open
askervin wants to merge 6 commits into
containers:mainfrom
askervin:5h1-balloons-cpuclass
Open

balloons: add cpuClasses#667
askervin wants to merge 6 commits into
containers:mainfrom
askervin:5h1-balloons-cpuclass

Conversation

@askervin
Copy link
Copy Markdown
Collaborator

@askervin askervin commented May 13, 2026

Add new configuration section cpuClasses to offer a user-friendly descriptive front-end on top of direct sysfs cpufreq controls in control.cpu.classes.

Reasons:

  • Improved coherence. Top-level cpuClasses section better aligned with schedulingClasses and loadClasses already available on the top level. It also uses the same class notation as these and balloonTypes, that is, a list of objects with "name" attribute specifying a class name, rather than using a key as a name like in control.cpu.classes.
  • Human-readable units. Allow configuring CPU frequencies using formats like like 3900MHz, 3.9GHz, in addition to specifying frequencies in integers [kHz] that are directly written to sysfs cpufreq files.
  • Platform-independent, runtime resolved symbolic frequencies. Support symbols "min" (minimum frequency), "base" (base frequency) and "turbo" (max turbo frequency).
  • Architectural support for dynamic CPU attributes with system-wide perspective. So far CPU class adjustments were static and not affected by which other CPU classes were in use. This change introduces new "turbo allocator" layer that sits between balloons (cpusets) and CPU controller. It has information on all CPU classes in use and their symbolic configurations, enabling it to control frequencies based on CPU priorities and platform properties, for instance.
  • Dynamic frequency adjustments. The initial version of the "CPU class turbo allocator" controls which CPU classes are allowed to use "turbo budget" on the host at each point of time. Example: if there are no performance critical real-time containers running, any CPU used by any container gets turbo frequencies. But if there are such critical containers running, turbo frequencies are reserved for their CPUs only by capping maximum frequency of other CPUs to the base frequency. Capping is effective only on symbolic frequencies. Explicit frequency values are respected as is. CPU classes with containers that have the highest turboPriority value get to share all turbo budget.

The purpose is to lay down the framework for dynamic turbo (and possibly other feature) management, and add only a very simple turbo allocator at this point. The allocator can be made smarter in the future, for instance, by making it aware of topology zones affected by some CPUs running on turbo frequencies, different turbo frequency levels, and the number of CPUs that can hold turbo frequencies on different platforms, and heterogeneous cores (P/E/LPE).

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 3 times, most recently from 2695589 to e919069 Compare May 15, 2026 11:06
@askervin askervin marked this pull request as ready for review May 15, 2026 11:31
@askervin askervin changed the title WIP: balloons: add cpuClasses balloons: add cpuClasses May 15, 2026
@askervin askervin requested a review from Copilot May 15, 2026 11:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a top-level cpuClasses configuration model for the balloons policy, including human-readable/symbolic frequency parsing, turbo-priority allocation, CRD/docs updates, and e2e coverage for turbo behavior and legacy CPU class syntax.

Changes:

  • Introduces CPUClass/Frequency API types, CRD schema updates, docs, and config template migration to cpuClasses.
  • Adds a balloons CPU class turbo allocator that resolves symbolic frequencies and coordinates with the CPU controller.
  • Updates CPU controller/sysfs/test support for dynamic classes, cpufreq overrides, write deduplication, and turbo-priority e2e validation.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cmd/plugins/balloons/policy/balloons-policy.go Wires turbo allocator into balloons setup, reset, assignment, validation, and reconfiguration.
cmd/plugins/balloons/policy/cpuclass.go Adds turbo-aware CPU class allocator and symbolic frequency resolution.
cmd/plugins/balloons/policy/flags.go Adds aliases for new CPU class/frequency config types.
config/crd/bases/config.nri_balloonspolicies.yaml Adds cpuClasses to generated CRD schema.
deployment/helm/balloons/crds/config.nri_balloonspolicies.yaml Adds Helm-packaged CRD schema for cpuClasses.
docs/resource-policy/policy/balloons.md Documents preferred cpuClasses, symbolic units, turbo priority, and legacy syntax.
pkg/apis/config/v1alpha1/balloons-policy.go Injects top-level cpuClasses into common CPU controller config.
pkg/apis/config/v1alpha1/resmgr/policy/balloons/config.go Adds CPUClasses to balloons policy config.
pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go Adds deepcopy support for balloons CPUClasses.
pkg/apis/config/v1alpha1/resmgr/policy/cpuclass.go Defines user-facing CPU class fields.
pkg/apis/config/v1alpha1/resmgr/policy/frequency.go Adds frequency parsing, JSON marshal/unmarshal, symbolic values, and resolution helpers.
pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go Adds deepcopy support for CPUClass.
pkg/resmgr/control/cpu/api.go Adds dynamic SetClass and defers enforcement logging before controller start.
pkg/resmgr/control/cpu/cache.go Downgrades missing assignment cache log on fresh startup.
pkg/resmgr/control/cpu/cpu.go Adds per-CPU cpufreq write cache and merges dynamic/static CPU class definitions.
pkg/sysfs/system.go Adds test-oriented cpufreq sysfs override support.
test/e2e/policies.test-suite/balloons/balloons-config.yaml.in Migrates default balloons test config to top-level cpuClasses.
test/e2e/policies.test-suite/balloons/n4c16/test17-cstates-scheduling/balloons-cstates.cfg Converts C-state class config to cpuClasses.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo.cfg Adds turbo-priority e2e config.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo-oldsyntax.cfg Adds legacy control.cpu.classes compatibility e2e config.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/code.var.sh Adds turbo-priority and cpufreq write-minimality e2e flow.
Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported
Comments suppressed due to low confidence (3)

cmd/plugins/balloons/policy/balloons-policy.go:1445

  • When a live update changes only idleCPUClass, this branch detects a CPU-class-only change but never copies newBalloonsOptions.IdleCpuClass into p.bpoptions. The allocator is reconfigured with the old idle class and resetCpuClass() continues to apply the old value, so idle class changes are ignored until a full policy reconfiguration occurs.
			// Update CPUClasses definitions.
			p.bpoptions.CPUClasses = newBalloonsOptions.CPUClasses
			if p.turboAllocator != nil {
				if err := p.turboAllocator.Reconfigure(p.bpoptions.CPUClasses, p.bpoptions.IdleCpuClass); err != nil {

cmd/plugins/balloons/policy/balloons-policy.go:1713

  • The turbo allocator is created/reconfigured before fillBuiltinBalloonDefs() and validateConfig() run. If validation fails, this has already mutated policy/controller state via p.turboAllocator and cpucontrol.SetClass, so an invalid configuration update can leave partially applied CPU class definitions behind despite setConfig() returning an error.
	if p.turboAllocator == nil {
		ta, err := NewCPUClassTurboAllocator(
			WithSystem(p.options.System),
			WithCache(p.cch),
			WithCPUClasses(bpoptions.CPUClasses),
			WithIdleClass(bpoptions.IdleCpuClass),
		)
		if err != nil {
			return balloonsError("failed to create CPU class turbo allocator: %w", err)
		}
		p.turboAllocator = ta
	} else {
		if err := p.turboAllocator.Reconfigure(bpoptions.CPUClasses, bpoptions.IdleCpuClass); err != nil {
			return balloonsError("failed to reconfigure CPU class turbo allocator: %w", err)
		}
	}

cmd/plugins/balloons/policy/cpuclass.go:199

  • Idle CPUs are assigned once but are not tracked or reassigned when the turbo winner changes. If idleCPUClass uses symbolic turbo, idle CPUs keep the effective value from the last reset/release (for example turbo from startup) even after a higher-priority active class should cap non-winners to base.
// ResetIdle assigns the given CPU set to the idle class via the CPU
// controller. Used at policy startup to bring all allowed CPUs to a
// known baseline before any container-driven UseClass call. Does not
// affect the active-class tracking.
func (a *CPUClassTurboAllocator) ResetIdle(cpus cpuset.CPUSet) error {
	if cpus.IsEmpty() {
		return nil
	}
	if err := cpucontrol.Assign(a.cch, a.idleClassName, cpus.UnsortedList()...); err != nil {
		return fmt.Errorf("failed to assign CPUs %s to idle class %q: %w", cpus, a.idleClassName, err)
	}
	return nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/apis/config/v1alpha1/resmgr/policy/frequency.go
Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
@askervin askervin requested review from kad and marquiz May 17, 2026 06:48
@askervin
Copy link
Copy Markdown
Collaborator Author

@kad, @marquiz, do you think we could approach turbo budget sharing with this kind of architecture in the balloons policy?

I'm adding cpuClasses under resmgr similarly to schedulingClasses to pave the way taking them into the topology-aware policy's guaranteed containers later on, too.

@askervin askervin marked this pull request as draft May 18, 2026 09:55
@askervin
Copy link
Copy Markdown
Collaborator Author

There is some technical and architectural debt that I wish to pay still in this PR. That is, the CPU controller should not directly modify frequencies, but this should be via cache and aligned with the spirit of applying "pending updates".

Unfortunately controller hooks are container-specific and possibly called multiple times while handling single NRI event, whereas all CPU properties should be written once per NRI event. I'll add yet another hook to the Controller interface to commit whatever changes a controller has stored since the previous Commit().

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from e919069 to d61da00 Compare May 18, 2026 10:52
@kad kad requested a review from Copilot May 19, 2026 07:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 24 changed files in this pull request and generated 5 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
Comment thread docs/resource-policy/policy/balloons.md Outdated
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 2 times, most recently from 15fdc7d to 22769db Compare May 19, 2026 11:40
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 8 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread pkg/resmgr/control/cpu/cpu.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go
Comment thread cmd/plugins/balloons/policy/balloons-policy.go Outdated
Comment thread docs/resource-policy/policy/balloons.md Outdated
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 3 times, most recently from e82098a to ce383ef Compare May 26, 2026 14:21
@askervin askervin requested a review from Copilot May 26, 2026 14:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 4 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment thread cmd/plugins/balloons/policy/cpuclass.go
Comment thread cmd/plugins/balloons/policy/cpuclass.go
Comment thread cmd/plugins/balloons/policy/balloons-policy.go
Comment thread pkg/apis/config/v1alpha1/balloons-policy.go
askervin added 5 commits May 27, 2026 11:32
…class definitions

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Introduce CPUClassTurboAllocator that owns CPU-class state and all
cpucontrol.Assign / cpucontrol.SetClass calls, keeping CPU-class
concerns out of the rest of the policy.

This change introduces very simple allocator that is unaware of
zones (sockets, dies) or CPU core counts affected by turbo on
different platforms. Smarter allocator is future work.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from ce383ef to 210c009 Compare May 27, 2026 08:34
@askervin askervin marked this pull request as ready for review May 27, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants