Skip to content

[SAP] Implement graceful shutdown for cinder services#314

Open
hemna wants to merge 1 commit into
stable/2023.1-m3from
graceful-shutdown
Open

[SAP] Implement graceful shutdown for cinder services#314
hemna wants to merge 1 commit into
stable/2023.1-m3from
graceful-shutdown

Conversation

@hemna
Copy link
Copy Markdown

@hemna hemna commented Feb 20, 2026

[SAP] Implement graceful shutdown for cinder services

Three-phase graceful shutdown that allows in-flight volume and backup operations to complete before the pod exits during Kubernetes rolling updates. Covers both cinder-volume and cinder-backup services.


The fix

Install SIG_IGN for SIGTERM/SIGINT/SIGHUP at the start of Service.stop().

cinder-volume runs as N forked child processes (one per backend) under oslo_service.ProcessLauncher. On SIGTERM, oslo_service's child handler calls SignalHandler.clear() which resets all handlers to SIG_DFL. The parent then sends a second SIGTERM via os.kill(child_pid, SIGTERM). Without SIG_IGN, the child terminates immediately — even though it's still inside pool.waitall() waiting for in-flight RPC handlers to finish.


How it works

Phase 1: Skip consumer cancel. We do NOT call rpcserver.stop() because it triggers an eventlet socket race ("simultaneous read on fileno") that disrupts outbound HTTP/RPC connections used by in-flight ops. Broker-side deregistration happens automatically when the OS closes the AMQP socket at process exit. During the gap, heartbeat stops (scheduler reroutes within service_down_time) and @reject_if_draining rejects late arrivals.

Phase 2: Block in pool.waitall() until all in-flight RPC handler greenthreads complete their operations.

Phase 3: Stop coordination, call super().stop(), cleanup green thread pool. Process exits cleanly.


Supporting mechanisms

  • Worker entry heartbeat (set_workers decorator): touches worker DB entries during operations via a spawned greenthread. Uses short interruptible sleeps (0.1s polling on the stop event) so the heartbeat exits within ~100ms when the operation completes, rather than blocking for up to 10s.
  • do_cleanup freshness check: skips worker entries updated within service_down_time, only cleans up truly stale entries. Prevents new pod from resetting in-flight operations on a draining pod.
  • Backup restore heartbeat: touches backup.updated_at every 10s, preventing new backup pod from triggering BackupRestoreCancel.
  • Backup _cleanup_one_backup freshness check: skips recent backups in creating/restoring state.
  • Backup _detach_device no-reraise: if detach fails during shutdown, log and continue. Data integrity preserved; dangling export cleaned on next startup.
  • reject_if_draining decorator: unconditionally rejects new RPC calls on a draining service so scheduler routes to healthy backends.
  • _add_to_threadpool / signal_shutdown / cleanup_threadpool: ThreadPoolManager uses eventlet GreenPool (cooperative greenthreads, same process) for async task dispatch. signal_shutdown() rejects new tasks; cleanup_threadpool() drains and re-creates the pool for potential service restart.

Requires (separate changes)

  • dumb-init --single-child on cinder-volume AND cinder-backup container commands
  • terminationGracePeriodSeconds: 900 on pod spec

Test results (qa-de-1, 2026-05-22)

test_nova_cinder_interactions.py — 7/7 PASS:

# Test Duration
T1 attach with ConnectorRejected baseline 195s
T2 kill OTHER-side pod during initialize_connection 283s
T3 kill VM-side pod during cross-shard FCD relocate 329s
T4 kill VM-side pod during Nova retry attach 309s
T9 operator migrate of attached volume (baseline) 213s
T10 kill SOURCE pod during operator migrate (canonical SIG_IGN reproducer) 285s
T11 kill DEST pod during operator migrate 238s

test_graceful_shutdown.py — single-op tests 6/6 PASS on KVM (premium volume type):

Test Duration
volume create (from image) 20s
volume delete 74s
volume extend (16→32 GB) 68s
volume clone 67s
snapshot create 66s
snapshot delete 87s

test_graceful_shutdown.py — concurrency suite 5/6 PASS on VMware:

Test Duration Ops on killed pod
sameop_create_x3 215s 3/3
sameop_extend_x3 202s 3/3
sameop_clone_x3 524s 3/3
mixed_create_extend_clone 392s 3/3
mixed_create_extend_snapshot_delete 336s 3/3
mixed_create3_extend1 222s 3/4 (1 vCenter InvalidArgument flake)

oslo.messaging companion patch — NOT required

Connection.stop_consuming() is never called in cinder's shutdown path. Verified empirically by adding [GS-AMQP] instrumentation — zero hits across all pod-kill logs. Full sweep passes identically with the patch enabled or disabled. The patch (review.opendev.org/986243) is orthogonal to this PR.


Files changed

File Purpose
cinder/service.py Three-phase shutdown, SIG_IGN install, _drain_pool() with pool.waitall()
cinder/volume/manager.py reject_if_draining decorator, signal_shutdown(), direct flow execution
cinder/manager.py ThreadPoolManager (GreenPool + shutdown coordination), do_cleanup freshness check
cinder/objects/cleanable.py Worker heartbeat greenthread in set_workers decorator (interruptible sleep)
cinder/backup/manager.py Backup restore heartbeat + freshness check + no-reraise on detach
cinder/volume/flows/manager/create_volume.py CreateVolumeOnFinishTask unconditional write
cinder/opts.py graceful_shutdown_timeout registration
cinder/tests/unit/test_manager.py ThreadPoolManager unit tests
cinder/tests/unit/test_service.py Service shutdown unit tests + signal flag reset
cinder/tests/unit/test.py _common_cleanup deregisters fake RPC endpoints
cinder/tests/unit/volume/__init__.py BaseVolumeTestCase GreenPool drain in addCleanup
cinder/tests/unit/backup/test_backup.py BaseBackupTest GreenPool drain in addCleanup
cinder/tests/unit/test_volume_cleanup.py Sync threadpool + service_down_time=0 for test determinism
cinder/tests/unit/api/contrib/test_admin_actions.py Explicit fake RPC server teardown

@hemna hemna force-pushed the graceful-shutdown branch 3 times, most recently from 5a72074 to 1f69e00 Compare February 23, 2026 14:24
@hemna hemna force-pushed the graceful-shutdown branch 3 times, most recently from 4c183ec to d331087 Compare April 30, 2026 12:37
@hemna hemna force-pushed the graceful-shutdown branch 2 times, most recently from 1dd055a to d2fddd7 Compare May 14, 2026 22:25
@hemna hemna changed the title [SAP] Try graceful shutdown [SAP] Implement graceful shutdown for cinder services May 14, 2026
@hemna hemna force-pushed the graceful-shutdown branch 4 times, most recently from a235a58 to b77438d Compare May 14, 2026 22:41
Scsabiii
Scsabiii previously approved these changes May 15, 2026
@hemna hemna force-pushed the graceful-shutdown branch 7 times, most recently from d972062 to ea28218 Compare May 22, 2026 20:29
@hemna hemna requested a review from Scsabiii May 22, 2026 20:33
@hemna hemna force-pushed the graceful-shutdown branch 7 times, most recently from db7847d to 8e7da2c Compare May 28, 2026 13:53
Three-phase graceful shutdown that allows in-flight volume and backup
operations to complete before the pod exits during Kubernetes rolling
updates. Covers both cinder-volume and cinder-backup services.

The fix: install SIG_IGN for SIGTERM/SIGINT/SIGHUP at the start of
Service.stop(). cinder-volume runs as N forked child processes (one per
backend) under oslo_service.ProcessLauncher. On SIGTERM, oslo_service's
child handler calls SignalHandler.clear() which resets all handlers to
SIG_DFL. The parent then sends a second SIGTERM via os.kill(child_pid,
SIGTERM). Without SIG_IGN, the child terminates immediately — even
though it's still inside pool.waitall() waiting for in-flight RPC
handlers to finish.

How it works:

Phase 1: Skip consumer cancel. We do NOT call rpcserver.stop() because
  it triggers an eventlet socket race ("simultaneous read on fileno")
  that disrupts outbound HTTP/RPC connections used by in-flight ops.
  Broker-side deregistration happens automatically at process exit.
  During the gap, heartbeat stops (scheduler reroutes within
  service_down_time) and @reject_if_draining rejects late arrivals.

Phase 2: Block in pool.waitall() until all in-flight RPC handler
  greenthreads complete their operations.

Phase 3: Stop coordination, call super().stop(), cleanup threadpool
  executor. Process exits cleanly.

Supporting mechanisms:
- Worker entry heartbeat (set_workers decorator): touches worker DB
  entries every 10s during operations, preventing new pod init_host
  do_cleanup from resetting in-flight volumes to error. Uses short
  interruptible sleeps (0.1s polling) for fast shutdown response.
- do_cleanup freshness check: skips worker entries updated within
  service_down_time, only cleans up truly stale entries.
- Backup restore heartbeat: touches backup.updated_at every 10s,
  preventing new backup pod from triggering BackupRestoreCancel.
- Backup _detach_device no-reraise: if detach fails during shutdown,
  log and continue. Data integrity preserved.
- reject_if_draining decorator: unconditionally rejects new RPC calls
  on a draining service so scheduler routes to healthy backends.

Test adaptations:
- Base test classes (BaseVolumeTestCase, BaseBackupTest) drain the
  GreenPool in addCleanup to prevent greenthread leaks.
- test_volume_cleanup sets service_down_time=0 and runs threadpool
  tasks synchronously (mock_object) to avoid race conditions.
- test_admin_actions tearDown explicitly stops fake RPC servers to
  deregister stale endpoints from oslo.messaging's fake transport.
- test_service assertions updated to reflect intentional skip of
  rpcserver.stop() in graceful shutdown, and signal flag reset.
- test_backup_messages updated for _detach_device no-reraise behavior.

Requires (separate changes):
- dumb-init --single-child on container commands
- terminationGracePeriodSeconds: 900 on pod spec

Change-Id: Icdd28affc73fd34491b656a68410dce8e46264d4
@hemna hemna force-pushed the graceful-shutdown branch from db7847d to e5a3ed9 Compare June 4, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants