Skip to content

F #7662: preserve rollback path for recover success#7682

Closed
ktfh-claw wants to merge 1 commit into
OpenNebula:masterfrom
ktfh-claw:issue-7662-transactional-migration
Closed

F #7662: preserve rollback path for recover success#7682
ktfh-claw wants to merge 1 commit into
OpenNebula:masterfrom
ktfh-claw:issue-7662-transactional-migration

Conversation

@ktfh-claw
Copy link
Copy Markdown

Summary

  • stop treating recover --success on PROLOG_MIGRATE_*_FAILURE states as a direct boot/deploy success path
  • route those failure states through the existing prolog-failure recovery path instead
  • preserve explicit retry behavior while avoiding a synthetic success transition on the destination side

Rationale

PROLOG_MIGRATE_*_FAILURE already means the transfer-manager prolog migration failed.
Calling recover --success from those states should not skip straight into boot/deploy as if the destination-side prolog had succeeded. The existing trigger_prolog_failure() path already performs the rollback/history handling for these migration failure states, so this change reuses that logic.

Validation

  • git diff --check
  • successful scons mysql=yes grpc=yes systemd=yes rubygems=yes -j32 build on Ubuntu build VM

Notes

I did not replace the running disposable 7.2.0 lab with this branch because the patch was built from current master (7.3.80.pre), and forcing a master build over that package-based lab would create unnecessary version skew during validation.

@ktfh-claw ktfh-claw closed this May 15, 2026
@ktfh-claw
Copy link
Copy Markdown
Author

Fresh clean-lab validation result for this patch:

I reran the migration/recovery test in a new isolated two-hypervisor disposable runtime instead of relying only on the earlier lab.

Environment:

  • frontend/runtime: 10.1.1.130, ONE_LOCATION=/home/claw/one-issue7662-fresh130, OpenNebula 7.3.80
  • host 0: 10.1.1.132
  • host 1: 10.1.1.133
  • test VM: ID 0, vm7662-run
  • base image: CirrOS qcow2 imported as image cirros7662

Controlled repro in the fresh lab:

  1. Booted the VM on host 0.
  2. Forced a target-side TM failure on host 1 with:
    • chmod 000 /home/claw/one-issue7662-fresh130/var/datastores/0
  3. Ran:
    • onevm migrate 0 1 --poff-hard
  4. Reproduced the expected migration failure in var/vms/0/vm.log:
    • Error executing image transfer script: Error copying disk directory to target host
    • rm: cannot remove '/home/claw/one-issue7662-fresh130/var/datastores/0/0': Permission denied
    • tar: /home/claw/one-issue7662-fresh130/var/datastores/0: Cannot open: Permission denied
    • tar: Error is not recoverable: exiting now
  5. VM entered PROLOG_MIGRATE_FAILURE on host 1.

Patched behavior check:

  1. Restored target permissions:
    • chmod 775 /home/claw/one-issue7662-fresh130/var/datastores/0
  2. Ran:
    • onevm recover 0 --success
  3. Observed the important behavior again:
    • active host rewound back to source host 10.1.1.132
    • history became:
      • SEQ 0 -> 10.1.1.132
      • SEQ 1 -> 10.1.1.133
      • SEQ 2 -> 10.1.1.132
    • recovery path transitioned through PROLOG_MIGRATE -> BOOT_MIGRATE on the source side, instead of treating the failed destination-side prolog migration as a direct success path
  4. The restore then failed in this disposable lab because there was no checkpoint file, leaving:
    • BOOT_MIGRATE_FAILURE
    • RESTORE: error: failed to get domain 'one-0'
    • Failed to open file '/home/claw/one-issue7662-fresh130/var/datastores/0/0/checkpoint': No such file or directory

So the important assertion held again in a fresh clean two-host runtime: recover --success from PROLOG_MIGRATE_FAILURE rewinds to the source-host rollback path instead of pretending the destination-side prolog migration succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant