fix updater: replace running executable atomically on darwin/linux#2685
fix updater: replace running executable atomically on darwin/linux#2685Rain-Swift wants to merge 2 commits intoMetaCubeX:Alphafrom
Conversation
839bb39 to
6c407f0
Compare
|
os.Rename replaces the destination with a new inode, so the original file attributes are lost. |
Thanks, this is a good point. Part of the reason I moved away from the old in-place overwrite path is that updating the live executable can already run into OS protection mechanisms. On Linux, You're right that a plain |
Summary
This changes the core updater on darwin/linux to use a staged file + atomic rename instead of copying the new binary directly onto the currently running executable.
It also makes retries tolerant of stale
meta-updatedirectories left behind by interrupted updates.Problem
The current update flow is:
meta-updatecurrentExePathupdateExePathdirectly tocurrentExePathOn darwin/linux, step 4 currently opens
currentExePathwithos.O_TRUNCbeforeio.Copy.If the process is interrupted after truncate and before the copy completes, the running executable can be left as a 0-byte file.
On macOS this is especially unsafe because the updater is modifying its own running executable in place. In practice, the process can be interrupted by code-signing / page validation after the on-disk executable changes, which leaves the replacement half-done.
There is also a retry issue:
download()usesos.Mkdir(updateDir), so a stalemeta-updatedirectory from a previous interrupted run makes the next retry fail withfile exists.Validation
1. Reproduction process on macOS
The following is one continuous failure window from a real Clash Verge Rev 2.4.7 installation with v1.19.23 mihomo kernel.
The first retry fails because
meta-updatealready exists:The second retry in the same repro window then proceeds into the dangerous replacement path and stops after backup:
Immediately in the same time window, macOS kernel logs show code-signing/page validation rejecting the running executable:
After that, Clash Verge can no longer talk to the core:
This gives one aligned failure chain:
meta-update2. Single patched reproduction window on macOS
I then applied this patch locally, rebuilt Mihomo, and reran the self-update flow in an isolated directory.
The patched build returned a normal HTTP response:
The patched updater log completed the full update and restart flow in one continuous window:
The updated binary remained intact and reported the new version:
During the patched update window (
2026-04-08 20:23:56to2026-04-08 20:24:09), querying unified kernel logs forrejecting invalid page,cs_mtime, orCorpse allowedreturned no match.This gives one aligned success chain:
200finishedRoot cause
There are two related issues:
currentExePathbefore the new binary is fully written.On macOS, modifying a running executable on disk can break later code-signing page validation. Apple defines
errSecCSStaticCodeChangedas “The code on disk has been modified after the code started running,” and XNU definesKERN_CODESIGN_ERRORas a page-fault rejection caused by a signature check. This is why in-place replacement of a live binary can terminate the process before the copy completes. 1, 2That leaves
currentExePathtruncated or empty.Changes
meta-updatecurrentExePathWhy this fixes the issue
Atomic rename never truncates the currently running executable in place.
Until the final rename succeeds, the old binary remains intact. If the process exits before that point,
currentExePathis still usable and retries can continue safely.Tests
This patch adds coverage for:
Verification
go test ./component/updater200 {"status":"ok"}updater: replace ...andupdater: finishedv1.19.23rejecting invalid pagemessage was observed during the patched update run