Kronos is a PromQL-driven VM placement optimization engine for OpenStack. It evaluates Prometheus metrics per host aggregate and plans live migrations to balance (or consolidate) workloads. Open-source, Apache 2.0, targeting OpenStack umbrella acceptance.
- Engine evaluates policies on a configurable interval. Its scope is a
list of Nova host aggregates defined in
[engine] aggregates, optionally plus the "unassigned hosts" pool ([engine] include_unassigned_hosts). Each aggregate is planned independently. - Combined scoring: when multiple policies share the same scope, each contributes to a weighted combined imbalance score. Policy weights sum to 1.0. Spread and pack modes cannot be combined - deploy separate engines per mode.
- Pure planner: casts migration tasks over RPC (
oslo.messagingRPCClient.cast()) to the per-aggregate executor topic. Never calls Nova live-migrate directly. - Executor: one per aggregate. Consumes RPC casts, runs pre-flight checks, calls Nova live-migrate, polls status, publishes results as notifications.
- Aggregate boundaries: migrations stay within an aggregate. The unassigned pool is its own scope - migrations never cross between it and a named aggregate.
- AZ awareness (planned): the planner discovers each
hypervisor's availability zone and surfaces it in logs and cycle
reports. Cross-AZ migrations are allowed by default; an opt-in
[engine] restrict_to_azflag makes the planner refuse moves whose source and destination AZs differ. - Host liveness gate: every cycle the engine fetches Nova
os-servicesonce and installs the result on the constraint checker. Only hosts whosenova-computeservice isstate=upandstatus=enabled(and notforced_down) are accepted as destinations. The constraint checker fails closed when service state has not been installed - the engine always installs it right afterinvalidate_cache(). The executor re-checks at pre-flight to catch state changes between dispatch and execution. - Evacuator: when
[engine] evacuate_disabled_hosts = true, VMs on hosts whose service isstatus=disabled(but still up) are evacuated. Runs before the affinity enforcer and the imbalance planner; all three sharemax_migrations_per_cycleas one budget. Hosts withstate=downare not evacuated - live migration cannot drain them. Storage is intentionally not validated; Nova'sblock_migration='auto'decides per-VM.
- oslo.config for all daemon configuration (
kronos.conf) - oslo.log for logging
- oslo.messaging:
- Engine → executor: RPC cast on
kronos.migrations.<aggregate>(exactly-one delivery, competing consumers) - Executor → engine: notifications on
kronos.results.<aggregate>(drives cooldown and quarantine state)
- Engine → executor: RPC cast on
- openstacksdk for Nova/Keystone API calls
- Entry points in
pyproject.tomlunder[project.scripts] - Config options registered in
kronos/common/config.py - Exceptions follow the
msg_fmtpattern (seekronos/common/exceptions.py)
/etc/kronos/kronos.conf- oslo.config INI (daemon settings: intervals, URLs, auth)/etc/kronos/policies.yaml- Pydantic-validated YAML (PromQL queries, thresholds, weights)
NEVER use import logging from the standard library. Always use oslo.log:
from oslo_log import log as logging
LOG = logging.getLogger(__name__)oslo.log wraps stdlib logging but integrates with oslo.config for log level,
format, and output configuration. The entry point (kronos/cmd/*.py) calls
logging.setup(CONF, 'kronos') once at startup - individual modules just
call logging.getLogger(__name__).
- Python 3.12+, type hints on all public APIs
- Pydantic v2 for policy YAML validation only (NOT for daemon config)
- Dataclasses for internal data types (HostScore, PolicyResult, etc.) - NOT frozen, for testability
- No global mutable state - dependency injection via constructors
- Ruff for linting (
ruff check), mypy strict mode - Tests: pytest, mocked HTTP via
responseslibrary, mocked openstacksdk
kronos/cmd/- CLI entry points (one module per binary)kronos/common/- Shared utilities, exceptions, oslo.config registration, messaging helperskronos/policies/- Pydantic models and loader for policy YAMLkronos/clients/- External service clients (Prometheus, Nova)kronos/engine/- Control loop, scoring, planning, cooldown trackingkronos/executor/- Migration executor (scheduler, runner, RPC server, result notifier)
kronos-engine→kronos.cmd.engine:main- scheduling engine daemonkronos-test-config→kronos.cmd.test_config:main- config validatorkronos-executor→kronos.cmd.executor:main- migration executor daemonkronos-record→kronos.cmd.record:main- snapshot live cluster statekronos-replay→kronos.cmd.replay:main- run engine pipeline offline against a snapshot
The engine operates on a fixed set of aggregates defined at startup:
[engine]
aggregates = gpu-aggregate, hpc-aggregate
include_unassigned_hosts = falseSemantics:
aggregates- comma-separated Nova host aggregate namesinclude_unassigned_hosts- when true, also plan the pool of compute hypervisors that belong to no aggregate (common in small deployments without aggregate groups)- Passing
NonetoNovaClient.get_hosts_in_aggregate()returns the unassigned pool - At least one of
aggregatesorinclude_unassigned_hosts=truemust be set or the engine fails at startup
pip install -e ".[dev]"
pytest tests/
ruff check kronos/ tests/
mypy kronos/Pydantic v2 models for the policy DSL. Policies contain PromQL query
strings, mode-specific fields (capacity_query for pack mode), and
cross-field validation - richer than oslo.config's flat key-value model.
oslo.config handles daemon config; Pydantic handles the policy file.
Key files:
models.py-PolicyConfig,PoliciesConfigloader.py- YAML file loading, returns validatedPoliciesConfig
Policy fields:
name: unique identifier (lowercase, alphanumeric + hyphens/underscores)mode:spreadorpackweight: 0.0-1.0, contribution to combined aggregate imbalanceimbalance_query: PromQL returning a per-host utilization ratio in [0, 1]host_label: Prometheus label identifying each host (defaulthost)vm_profile_query: PromQL returning a per-VM metric for planner simulationvm_profile_label/vm_profile_label_type: how to map Prometheus labels to Nova instancesvm_profile_fallback:skip/flavor_vcpu_ratio/host_averagethreshold: imbalance threshold (max - min) to trigger rebalancingcapacity_query: PromQL for pack mode capacity check (required for pack)capacity_threshold: max host utilization ceiling for pack modemax_migrations_per_cycle: cap per evaluation cycleenabled: whether the policy is active
Aggregates are NOT in the policy. They live on the engine via
[engine] aggregates and [engine] include_unassigned_hosts. A
policies.yaml binds to the engine that loads it and applies to every
aggregate in that engine's scope.
PoliciesConfig validators (load-time invariants):
- Unique
nameacross policies - All policies share a
mode(spread and pack cannot mix in one file) - Enabled
weightvalues sum to 1.0 exactly (1e-6 tolerance)
The [0, 1] contract: imbalance_query must return values in [0, 1].
Enforced at runtime by the scorer - out-of-range values cause that
policy to be skipped for the cycle with an error logged. Not enforced
at load time because the query has to run against live Prometheus.
Adding a new policy field:
- Add to
PolicyConfiginmodels.pywithField(...) - Add cross-field validation via
@model_validatorif needed - Update the sample
policies.yamlininternal-documentation/ - Add tests in
tests/unit/policies/test_models.py - Update this section
Wrappers for external services. Clients return data; they never make decisions. Retry logic lives in the client, not the caller.
- Uses
requestsdirectly (NOTprometheus-api-client- poorly maintained) - Endpoints:
/api/v1/query(instant),/api/v1/query_range(range),/api/v1/status/runtimeinfo(health) PrometheusHealthenum:HEALTHY,STALE,PARTIAL,UNREACHABLEinstant_query(query, label_key, expected_labels) -> QueryResultlabel_key: which Prometheus label to use as dict key (defaulthost)expected_labels: set of expected values (e.g. Nova hostnames) for partial-data detection
- Retries via
tenacitywith exponential backoff - Auth: bearer token (string or file), optional CA cert
- Config source: oslo.config
[prometheus]group
openstacksdkwith akeystoneauth1session loaded from oslo.config- Auth via
[nova]config:auth_type,auth_url,username,password, etc. ks_loading.load_auth_from_conf_options()+load_session_from_conf_options()- Returns dataclasses (
ComputeHost,Instance,HostAggregate,MigrationStatus), not raw openstacksdk objects - Read:
list_aggregates(),get_aggregate(),list_compute_hosts(),list_instances_on_host(),list_server_groups(),list_compute_services(),get_hosts_in_aggregate(name | None)-Nonereturns hypervisors not in any aggregate (the unassigned pool). Only QEMU/KVM hypervisors are returned; ironic bare-metal nodes are filtered out. - Write:
live_migrate(),get_instance_status(),get_instance_host(),get_migration_status() list_compute_services(): returnsComputeServicedataclasses for everynova-computeentry inos-services. Each hashost,binary,state(up/down),status(enabled/disabled),disabled_reason,forced_down. The helperis_available_destinationis True only when up + enabled and not forced down. Other binaries (conductor/scheduler) are filtered out.- Server group compatibility: reads both legacy
policy(singular string) and modernpolicies(list) and merges them. Older Nova deployments populate only the singular field. - Server groups are listed across all projects (
all_projects=True) so the constraint checker sees every group the cluster cares about.
- Prometheus:
responseslibrary to mock HTTP at transport level - Nova:
unittest.mock.patchonopenstack.connect() - Fixtures in
tests/fixtures/prometheus_responses/
- Clients NEVER make decisions - return data, the engine decides
- All exceptions inherit from
KronosException - Exceptions use the
msg_fmtpattern with%(placeholder)sformatting - No caching in clients themselves (constraint checker caches server groups once per cycle)
Main control loop, policy evaluation, VM profiling, constraint checking, combined-scoring migration planning, and cooldown tracking. The engine is a pure planner - it evaluates policies, detects imbalance, simulates moves, produces migration plans, and casts them over RPC. It never calls Nova live-migrate directly.
types.py-HostScore,VmProfile,MigrationStep,MigrationPlan,MigrationTask,MigrationResult,MigrationPhase,PolicyResult,AggregateResult,CycleReport_sim.py- Shared simulation helpers (combined_imbalance,simulate_move,ScoreState) used by planner and enforcerscorer.py-PolicyScorer: runs a policy's PromQLimbalance_queryagainst a host list, normalises scores, enforces the [0, 1] contractprofiler.py-VmProfiler: collects per-VM resource weights across all policies for an aggregate in one passplanner.py-Planner: combined-scoring simulation (spread greedy + pack First Fit Decreasing)constraints.py-ConstraintChecker: validates moves against all four Nova server group placement policies (affinity, anti-affinity, soft-affinity, soft-anti-affinity)affinity_enforcer.py-AffinityEnforcer: repair pass that moves VMs out of existing server-group violations using the same combined-imbalance math the planner usescooldown.py-CooldownTracker: aggregate-level cooldown, instance-level cooldown, post-failure quarantineresult_listener.py-MigrationResultEndpoint: oslo.messaging notification endpoint that consumesmigration.completed/migration.failedevents and drives quarantineevacuator.py-Evacuator: drains VMs offstatus=disablednova-compute hosts. Runs before the affinity enforcer and the planner; uses the same combined-imbalance + threshold mathloop.py-EngineLoop: periodic per-aggregate evaluation cycle, plan emission via RPC cast, result-listener lifecycle
The engine operates on a fixed set of aggregates defined at startup via
[engine] aggregates and [engine] include_unassigned_hosts. Each
aggregate is evaluated independently every cycle. The unassigned-hosts
pool is resolved by NovaClient.get_hosts_in_aggregate(None).
oslo.config → EngineLoop
├── loads policies (Pydantic) - all enabled policies apply to every aggregate
├── resolves aggregate scope (aggregates list + unassigned pool)
│
├── for each aggregate in scope:
│ ├── hosts = NovaClient.get_hosts_in_aggregate(aggregate)
│ │
│ ├── for each enabled policy:
│ │ └── PolicyScorer.evaluate(policy, hosts) → PolicyResult
│ │ ├── PrometheusClient.instant_query(imbalance_query)
│ │ ├── filter to scope hosts, enforce [0, 1] range
│ │ ├── normalize, compute imbalance
│ │ └── return PolicyResult
│ │
│ ├── combined_imbalance = Σ (policy.weight × policy.imbalance)
│ │
│ └── if any policy.imbalance_detected OR enforcer.enabled:
│ ├── check cooldown (skip if aggregate is cooling)
│ │
│ ├── VmProfiler.collect(policies, hosts, host_scores_by_policy)
│ │ ├── NovaClient.list_instances_on_host(host)
│ │ ├── PrometheusClient.instant_query(vm_profile_query) - one per policy
│ │ └── VmProfile.weights[policy_name] for every policy
│ │
│ ├── AffinityEnforcer.enforce(...) - phase="affinity"
│ │
│ ├── Planner.plan(...) - phase="spread" or "pack"
│ │
│ ├── _filter_unavailable_vms(vm_profiles) - drop quarantined + cooling VMs
│ │
│ └── if not dry_run and plan has steps:
│ ├── for each step: RPCClient.cast('execute_migration', task)
│ └── CooldownTracker.record_plan_emission(aggregate, instance_uuids)
│
└── log CycleReport (aggregate results, per-policy imbalances, plans)
In parallel, a notification listener per aggregate consumes results:
kronos.results.<aggregate> → MigrationResultEndpoint.info()
├── migration.completed - DEBUG log
├── migration.failed (retry pending)- DEBUG log
└── migration.failed (final) - INFO log, quarantine_instance()
When multiple policies share an aggregate their imbalance values combine into a single weighted score:
combined_imbalance = Σ (policy.weight × policy.imbalance)
Constraints on PoliciesConfig (load time):
- Enabled policy
weightvalues sum to 1.0 - All policies in a file share the same
mode(spread or pack, not both) - Each
imbalance_querymust return values in [0, 1] (runtime, scorer-enforced)
Each VM carries per-policy weights:
VmProfile.weights: dict[str, float] # {policy_name: weight}Simulating a move applies the weight subtraction and addition per policy - a single move affects every policy's scores.
Move acceptance rule: a candidate move is rejected if it would worsen any individual policy's imbalance and the policy's new imbalance is above its own threshold. Lets the planner trade dimensions (small CPU regression to fix a large memory problem) while preventing:
- pushing a previously-OK policy into violation
- making an already-violating policy even worse
Stopping criterion: greedy spread stops when every policy's
imbalance is at or below its threshold, or when
max_migrations_per_cycle is reached, or when no improving move
exists.
Raw per-host Prometheus values are normalised 0.0-1.0 within each aggregate using min-max for logging purposes. The planner uses raw values (already in [0, 1]) for its combined score math.
Spread (greedy combined-score simulation): each round, try every
(source, vm, dest) combination, compute the new combined imbalance,
reject any move that worsens a policy past its threshold, pick the
move with the largest combined improvement. Repeat until all policies
are happy, max_migrations_per_cycle is hit, or no improving move
exists.
Pack (First Fit Decreasing on combined score):
- Sort hosts by combined weighted score ascending (coldest first = drain order)
- For each drain host, sort VMs by combined weighted weight descending
- For each VM, find the fullest non-draining host where every policy's
projected score stays below that policy's
capacity_threshold - Track draining hosts so VMs are never moved to them
- Stop at
max_migrations_per_cycle
The VmProfiler runs once per aggregate and builds
dict[instance_uuid → VmProfile] where each profile carries per-policy
weights. For every (instance, policy) pair:
- Query Prometheus with that policy's
vm_profile_query - Map the label value to a Nova instance via
vm_profile_label_type - If no data, apply the policy's fallback:
skip- drop the VM from the whole combined profile (default, safest)flavor_vcpu_ratio- estimate from host score and vCPU counthost_average- assume equal share of host's score
A VM is excluded from planning entirely if any policy with skip
fallback has no data for it.
ConstraintChecker reads Nova server groups across all projects and
treats all four placement policies as move-blocking:
anti-affinity/soft-anti-affinity: a move is rejected if another group member already lives on the destination hostaffinity/soft-affinity: a move is rejected unless every other currently placed member of the group is already on the destination host. Members outside the current aggregate are ignored - Kronos only reasons about VMs visible in the planner'svms_by_hostindex
Soft rules are currently enforced with the same strictness as hard ones. They will later be promoted to weighted planner penalties (so a move that mildly violates a soft rule can still win if it resolves a much larger imbalance), but for now both flavours veto.
The cache is invalidated each engine cycle.
Future: NUMA, CPU feature flags, flavor extra specs, soft-rule penalties in the planner - https://docs.openstack.org/nova/latest/user/server-groups.html
Evacuator drains VMs off compute hosts whose nova-compute service is
administratively disabled. Enabled per engine via:
[engine]
evacuate_disabled_hosts = false # defaultAlgorithm, per aggregate:
- Intersect the cluster-wide service map with the aggregate's host list (so disabled hosts in unrelated aggregates / AZs are ignored).
- Identify candidate VMs: every VM on a
status=disabledandstate=uphost in scope. Hosts withstate=downare not evacuated - live migration cannot drain them. - For each candidate, simulate every (vm, dest) pair and accept it
only when:
ConstraintChecker.checkpasses (destination is up + enabled, and no server-group rule is broken).- The move does not push any policy past its threshold.
- Pick the pair that leaves combined imbalance lowest, apply it,
record a step with phase
evacuate, and repeat.
Order: evacuator → affinity enforcer → imbalance planner. All
three share one max_migrations_per_cycle budget; the evacuator
consumes first.
Pre-flight (executor): every migration re-checks source +
destination service state. Refuses if either is no longer up +
enabled, except evacuation tasks (phase=evacuate) tolerate a
status=disabled source - that is precisely the condition that
made the move legitimate.
AffinityEnforcer runs before the imbalance planner and proactively
repairs server-group placements that are already in violation. The
checker guards against new violations; the enforcer fixes existing ones.
Enabled per policy-type via [engine]:
enforce_hard_affinity = false # affinity + anti-affinity
enforce_soft_affinity = false # soft-affinity + soft-anti-affinityBoth default false. The enforcer is a no-op when neither is enabled.
Algorithm, per aggregate:
- Fetch the relevant server groups (filtered by enabled flags)
- Detect violations against current VM placement:
anti-affinity/soft-anti-affinity: ≥2 members share a hostaffinity/soft-affinity: members span >1 host
- Collect the set of offending VM UUIDs across all violations
- Simulate every (offending_vm, destination) pair. Accept only pairs that:
- pass
ConstraintChecker(no other group rule broken) - do not push any policy's imbalance above its threshold
(
_sim.move_hurts_any_policy)
- pass
- Pick the pair that leaves combined imbalance lowest. Apply it, record
a step with phase
affinity - Repeat until no violations remain, budget exhausted, or no legal destination exists
The enforcer returns the simulated post-enforcement scores,
vms_by_host, and remaining budget. The Planner starts from those so
the imbalance pass sees the world the executor will actually observe.
Budget sharing: enforcement and imbalance planning share a single
max_migrations_per_cycle budget (max across active policies).
Enforcement consumes first; the imbalance planner gets the remainder.
No legal repair: if no (vm, dest) pair clears the constraint and threshold filters, the enforcer logs a warning and stops. The imbalance planner still runs with the remaining budget.
Cooldown semantics: repair moves go through the normal plan emission path, so aggregate and instance cooldowns fire exactly as they do for imbalance-driven moves.
CooldownTracker prevents oscillation and migration storms:
- Aggregate-level: after emitting a plan for an aggregate, skip
planning for the same aggregate until
engine.cooldownexpires. Global across all policies - combined scoring means one plan covers all policies for an aggregate per cycle. - Instance-level: after including a VM in a plan, skip it for
engine.instance_cooldownseconds. Prevents a VM bouncing between hosts via different migrations. - Quarantine: after a migration definitively fails (retries
exhausted on
PreFlightError/MigrationFailed/MigrationTimeout), skip the VM forengine.instance_quarantine_seconds. Use-1for indefinite quarantine.NovaClientErroris treated as transient and does not quarantine; the normal instance cooldown governs re-planning.
EngineLoop._filter_unavailable_vms is the single chokepoint that
enforces both instance cooldown and quarantine: it runs immediately
after VmProfiler.collect() and drops excluded UUIDs from the
vm_profiles dict before the enforcer and planner see it. The scorer
still computes per-host imbalance from fresh Prometheus data - only
the candidate set for movement is filtered.
MigrationResultEndpoint (result_listener.py) subscribes to
kronos.results.<aggregate> and reacts per event type:
| Event | Condition | Action |
|---|---|---|
migration.completed |
- | DEBUG log, no state change |
migration.failed |
retry_count < max_retries (retry pending) |
DEBUG log, no state change |
migration.failed |
final attempt, error_type in {PreFlightError, MigrationFailed, MigrationTimeout} |
INFO log, quarantine_instance(uuid, instance_quarantine_seconds) |
migration.failed |
final attempt, error_type == NovaClientError |
INFO log; no quarantine (transient infra error) |
migration.failed |
final attempt, unknown error_type |
INFO log, defensive quarantine |
EngineLoop._init_messaging creates one notification listener per
aggregate (plus one for the unassigned pool) using
get_notification_listener(). Listeners are started before the loop
begins and drained in _shutdown_messaging() on SIGTERM/SIGINT.
Listeners are only wired when dry_run=false. In dry-run, no plans
are cast so no results ever arrive.
kronos-replay can seed the tracker from <snapshot>/cooldowns.json
so offline runs reproduce scenarios where specific VMs are already
cooling or quarantined. The file is optional and has three optional
sections:
{
"aggregate_cooldowns": {"gpu": 120},
"instance_cooldowns": {"vm-abc": 300},
"instance_quarantines": {"vm-xyz": 1800, "vm-banned": -1}
}Values are seconds remaining at replay time. -1 in
instance_quarantines means indefinite. kronos-record writes an
empty template; operators edit it to stage test scenarios.
Seeding uses CooldownTracker.seed_aggregate_cooldown,
seed_instance_cooldown, and seed_instance_quarantine. Aggregate
and instance cooldowns are clamped to the configured durations so a
snapshot claiming more seconds remaining than [engine] cooldown /
instance_cooldown degrades to the configured max rather than
erroring.
- Load oslo.config + policies YAML
- Resolve aggregate scope from
[engine] aggregatesandinclude_unassigned_hosts - Initialise Prometheus client, Nova client, scorer, profiler, constraints, evacuator, enforcer, planner, cooldown
- If
dry_run=false: create RPC transport + per-aggregate RPC clients, and a notification transport + per-aggregate result listeners - Enter loop. Each cycle:
invalidate_cache()on the constraint checker.- Fetch nova-compute services once cluster-wide; install on the constraint checker (host-availability gate) and pass to the evacuator. On failure, install an empty map - the gate fails closed and no migrations are emitted that cycle.
- For each aggregate → score → cooldown check → profile → drop
VMs whose source host is
state=down→ filter quarantined/cooling VMs → evacuate disabled hosts → enforce affinity → plan imbalance → cast → log.
- Handle SIGTERM/SIGINT for graceful shutdown (drains result listeners)
Consumes migration tasks via RPC from the engine and executes them via the Nova live-migrate API. The executor never decides what to migrate
- it only validates, executes, and reports results.
worker.py-ExecutorWorker: top-level wiring (RPC server, scheduler, runner, result notifier);MigrationRPCEndpoint: RPC endpoint exposingexecute_migrationscheduler.py-TaskScheduler: priority queue sorted bynot_before, semaphore for concurrency controlmigrate.py-MigrationRunner: pre-flight check, Nova live-migrate, poll status, post-flight verify
- One executor per aggregate (or for the unassigned-hosts pool)
- Started with either
--aggregate <name>or--unassigned - RPC topic:
kronos.migrations.<aggregate>(orkronos.migrations._unassigned_for the unassigned pool) - Results topic:
kronos.results.<aggregate>
Two oslo.messaging primitives, each for a reason:
- Engine → Executor: RPC cast. Exactly-one delivery via competing consumers.
- Executor → Engine: Notifications. The engine consumes
migration.completed/migration.failedevents to drive cooldown and quarantine state. On failure the payload carrieserror_type(exception class name) so the engine can distinguish VM-specific failures (PreFlightError,MigrationFailed,MigrationTimeout) from transient infrastructure errors (NovaClientError).
Engine.RPCClient.cast('execute_migration', task={...})
│
▼
RabbitMQ topic (kronos.migrations.<agg>)
│
▼
MigrationRPCEndpoint.execute_migration(ctxt, task)
│ ← oslo.messaging acks on handler return
▼
TaskScheduler.submit(task) ← queued by not_before timestamp
│
▼ (when not_before arrives + semaphore slot free)
│
MigrationRunner.execute(task)
├── pre-flight: ACTIVE? no task_state? still on from_host?
├── nova.live_migrate()
├── poll nova.get_migration_status() until terminal or timeout
└── post-flight: on to_host? ACTIVE?
│
▼
Notifier.info('migration.completed' | 'migration.failed', result)
│
▼
RabbitMQ topic (kronos.results.<agg>)
│
▼
Engine notification listener updates cooldown/quarantine state
- On failure, if
retry_count < max_retries, the task is re-cast to the same RPC migrations topic with incrementedretry_countand exponential backoffnot_before - After
max_retries, the task is logged and dropped (dead letter) - Retries go back through RabbitMQ
- oslo.messaging acks messages when the handler returns (submission to the local scheduler), not when the migration completes
- If the executor restarts, tasks in the local scheduler queue are lost
- Tasks mid-migration: Nova continues the migration regardless; the engine re-evaluates next cycle and sees updated host scores
- Acceptable because the system is self-healing - the engine re-plans
every
evaluation_intervalseconds
- Pre-flight check before every migration (the plan may be stale)
- Staggered execution via
not_beforetimestamps on each task - Concurrency cap via semaphore (
max_concurrent_migrations) - Hard timeout on migration polling
- Post-flight verification (host + status)
- Idempotent: pre-flight catches duplicate or stale tasks