Skip to content

fix updater: replace running executable atomically on darwin/linux#2685

Open
Rain-Swift wants to merge 2 commits intoMetaCubeX:Alphafrom
Rain-Swift:codex/updater-atomic-replace
Open

fix updater: replace running executable atomically on darwin/linux#2685
Rain-Swift wants to merge 2 commits intoMetaCubeX:Alphafrom
Rain-Swift:codex/updater-atomic-replace

Conversation

@Rain-Swift
Copy link
Copy Markdown

Summary

This changes the core updater on darwin/linux to use a staged file + atomic rename instead of copying the new binary directly onto the currently running executable.

It also makes retries tolerant of stale meta-update directories left behind by interrupted updates.

Problem

The current update flow is:

  1. download the archive into meta-update
  2. unpack the new core
  3. back up currentExePath
  4. copy updateExePath directly to currentExePath

On darwin/linux, step 4 currently opens currentExePath with os.O_TRUNC before io.Copy.

If the process is interrupted after truncate and before the copy completes, the running executable can be left as a 0-byte file.

On macOS this is especially unsafe because the updater is modifying its own running executable in place. In practice, the process can be interrupted by code-signing / page validation after the on-disk executable changes, which leaves the replacement half-done.

There is also a retry issue: download() uses os.Mkdir(updateDir), so a stale meta-update directory from a previous interrupted run makes the next retry fail with file exists.

Validation

1. Reproduction process on macOS

The following is one continuous failure window from a real Clash Verge Rev 2.4.7 installation with v1.19.23 mihomo kernel.

The first retry fails because meta-update already exists:

[2026-04-08 19:26:18.206] level=info msg="start update"
[2026-04-08 19:26:20.727] level=info msg="current version v1.19.21, latest version v1.19.23"
[2026-04-08 19:26:20.728] level=info msg="updater: updating using url: https://github.com/MetaCubeX/mihomo/releases/latest/download/mihomo-darwin-arm64-v1.19.23.gz"
[2026-04-08 19:26:20.728] level=info msg="updateExeName: mihomo-darwin-arm64"
[2026-04-08 19:26:23.134] level=error msg="updater: failed: downloading: mkdir error: mkdir /Applications/Clash Verge.app/Contents/MacOS/meta-update: file exists"
[2026-04-08 19:26:23.134] level=warning msg="downloading: mkdir error: mkdir /Applications/Clash Verge.app/Contents/MacOS/meta-update: file exists"

The second retry in the same repro window then proceeds into the dangerous replacement path and stops after backup:

[2026-04-08 19:26:29.896] level=info msg="start update"
[2026-04-08 19:26:32.070] level=info msg="current version v1.19.21, latest version v1.19.23"
[2026-04-08 19:26:32.070] level=info msg="updater: updating using url: https://github.com/MetaCubeX/mihomo/releases/latest/download/mihomo-darwin-arm64-v1.19.23.gz"
[2026-04-08 19:26:32.070] level=info msg="updateExeName: mihomo-darwin-arm64"
[2026-04-08 19:26:43.498] level=info msg="updater: unpacking package"
[2026-04-08 19:26:43.700] level=info msg="updater: backing up current ExecFile:/Applications/Clash Verge.app/Contents/MacOS/verge-mihomo to /Applications/Clash Verge.app/Contents/MacOS/meta-backup/verge-mihomo"
[2026-04-08 19:26:43.761] level=warning msg="codesign failed: exit status 1"
[2026-04-08 19:26:43.762] level=info msg="updater: copy: /Applications/Clash Verge.app/Contents/MacOS/verge-mihomo to /Applications/Clash Verge.app/Contents/MacOS/meta-backup/verge-mihomo"

Immediately in the same time window, macOS kernel logs show code-signing/page validation rejecting the running executable:

2026-04-08 19:26:43.762 kernel CODE SIGNING: process 75086[verge-mihomo]: rejecting invalid page at address 0x100c80000 from offset 0x8000 in file "/Applications/Clash Verge.app/Contents/MacOS/verge-mihomo" (cs_mtime:1775647420.563043230 != mtime:1775647603.764889006) (signed:0 validated:0 tainted:0 nx:0 wpmapped:0 dirty:1 depth:1)
2026-04-08 19:26:43.771 kernel verge-mihomo[75086] Corpse allowed 1 of 5

After that, Clash Verge can no longer talk to the core:

[2026-04-08 19:26:55.513] WARN [Core] Failed to apply configuration by mihomo api, error msg: Connection failed, I/O error: Connection refused (os error 61)
[2026-04-08 19:27:06.597] WARN [Core] Failed to apply configuration by mihomo api, error msg: Connection failed, I/O error: No such file or directory (os error 2)

This gives one aligned failure chain:

  1. retry hits stale meta-update
  2. next retry reaches unpack + backup
  3. updater is still modifying the running executable in place
  4. macOS rejects the mutated running image
  5. process dies before replacement completes
  6. the app loses the core

2. Single patched reproduction window on macOS

I then applied this patch locally, rebuilt Mihomo, and reran the self-update flow in an isolated directory.

The patched build returned a normal HTTP response:

{"status":"ok"}

HTTP_STATUS:200

The patched updater log completed the full update and restart flow in one continuous window:

time="2026-04-08T20:23:56.433558000+08:00" level=info msg="start update"
time="2026-04-08T20:23:58.909983000+08:00" level=info msg="current version test-fix, latest version v1.19.23"
time="2026-04-08T20:23:58.910005000+08:00" level=info msg="updater: updating using url: https://github.com/MetaCubeX/mihomo/releases/latest/download/mihomo-darwin-arm64-v1.19.23.gz"
time="2026-04-08T20:23:58.910024000+08:00" level=info msg="updateExeName: mihomo-darwin-arm64"
time="2026-04-08T20:24:01.333662000+08:00" level=debug msg="updater: saving package to file /tmp/mihomo-patched-prlog2.qI9QSE/meta-update/mihomo-darwin-arm64-v1.19.23.gz"
time="2026-04-08T20:24:07.305561000+08:00" level=debug msg="updater: downloaded package to file /tmp/mihomo-patched-prlog2.qI9QSE/meta-update/mihomo-darwin-arm64-v1.19.23.gz"
time="2026-04-08T20:24:07.307846000+08:00" level=info msg="updater: unpacking package"
time="2026-04-08T20:24:07.540095000+08:00" level=info msg="updater: backing up current ExecFile:/tmp/mihomo-patched-prlog2.qI9QSE/verge-mihomo to /tmp/mihomo-patched-prlog2.qI9QSE/meta-backup/verge-mihomo"
time="2026-04-08T20:24:07.626590000+08:00" level=info msg="updater: copy: /tmp/mihomo-patched-prlog2.qI9QSE/verge-mihomo to /tmp/mihomo-patched-prlog2.qI9QSE/meta-backup/verge-mihomo"
time="2026-04-08T20:24:07.792971000+08:00" level=info msg="updater: replace: /tmp/mihomo-patched-prlog2.qI9QSE/meta-update/mihomo-darwin-arm64 to /tmp/mihomo-patched-prlog2.qI9QSE/verge-mihomo via /tmp/mihomo-patched-prlog2.qI9QSE/.verge-mihomo.tmp-3046210258"
time="2026-04-08T20:24:07.812660000+08:00" level=info msg="updater: finished"
time="2026-04-08T20:24:07.813034000+08:00" level=info msg="restarting: \"/tmp/mihomo-patched-prlog2.qI9QSE/verge-mihomo\" [\"-d\" \"/tmp/mihomo-patched-prlog2.qI9QSE/data\" \"-f\" \"/tmp/mihomo-patched-prlog2.qI9QSE/config.yaml\" \"-ext-ctl-unix\" \"/tmp/mihomo-patched-prlog2.qI9QSE/verge.sock\"]"
time="2026-04-08T20:24:08.736323000+08:00" level=info msg="RESTful API unix listening at: /tmp/mihomo-patched-prlog2.qI9QSE/verge.sock"

The updated binary remained intact and reported the new version:

-rwxr-xr-x  1 ... 30M /tmp/mihomo-patched-prlog2.qI9QSE/meta-backup/verge-mihomo
-rwxr-xr-x  1 ... 30M /tmp/mihomo-patched-prlog2.qI9QSE/verge-mihomo
Mihomo Meta v1.19.23 darwin arm64 with go1.26.2 Wed Apr  8 01:07:41 UTC 2026
Use tags: with_gvisor

During the patched update window (2026-04-08 20:23:56 to 2026-04-08 20:24:09), querying unified kernel logs for rejecting invalid page, cs_mtime, or Corpse allowed returned no match.

This gives one aligned success chain:

  1. update request returns 200
  2. package downloads and unpacks
  3. current executable is backed up
  4. new executable is staged and atomically renamed into place
  5. updater logs finished
  6. process restarts successfully
  7. updated binary remains intact and runnable

Root cause

There are two related issues:

  1. The updater performs in-place self-overwrite of the currently running executable on darwin/linux.
  2. The replacement path truncates currentExePath before the new binary is fully written.

On macOS, modifying a running executable on disk can break later code-signing page validation. Apple defines errSecCSStaticCodeChanged as “The code on disk has been modified after the code started running,” and XNU defines KERN_CODESIGN_ERROR as a page-fault rejection caused by a signature check. This is why in-place replacement of a live binary can terminate the process before the copy completes. 1, 2

That leaves currentExePath truncated or empty.

Changes

  • recreate or clear the staging update directory instead of failing on stale meta-update
  • on darwin/linux, copy the new core to a temporary sibling file in the destination directory
  • only apply ad-hoc signing to the staged replacement file on macOS
  • atomically rename the staged file over currentExePath
  • keep the existing Windows behavior unchanged

Why this fixes the issue

Atomic rename never truncates the currently running executable in place.

Until the final rename succeeds, the old binary remains intact. If the process exits before that point, currentExePath is still usable and retries can continue safely.

Tests

This patch adds coverage for:

  • clearing stale update directories before a retry
  • replacing the destination via the staged atomic path
  • leaving the destination untouched when staged replacement fails before rename

Verification

  • go test ./component/updater
  • isolated self-update repro on macOS now returns 200 {"status":"ok"}
  • updater logs updater: replace ... and updater: finished
  • the target executable remains intact and updates successfully to v1.19.23
  • no rejecting invalid page message was observed during the patched update run

@wwqgtxx wwqgtxx force-pushed the Alpha branch 3 times, most recently from 839bb39 to 6c407f0 Compare April 14, 2026 23:57
@xishang0128
Copy link
Copy Markdown
Contributor

os.Rename replaces the destination with a new inode, so the original file attributes are lost.

@Rain-Swift
Copy link
Copy Markdown
Author

os.Rename replaces the destination with a new inode, so the original file attributes are lost.

Thanks, this is a good point.

Part of the reason I moved away from the old in-place overwrite path is that updating the live executable can already run into OS protection mechanisms. On Linux, open(2) documents ETXTBSY for an executable image that is currently being executed, and on macOS it does hit the Hardened Runtime in my case.

You're right that a plain os.Rename would otherwise lose the destination inode's existing attributes. I've updated the staged replacement path to preserve the destination metadata before the final rename: existing mode, POSIX attrs/xattrs/ownership on linux/darwin in a best-effort way, and Darwin st_flags separately. I also added tests for mode/xattr preservation, symlink-target replacement, and Darwin flag preservation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants