V3: 启动时接管自己旧实例占用的端口(更新即用)#160
Merged
Merged
Conversation
…tes) Every version update failed to launch because the old backend still held the agent/hub ports. On startup, reclaim a port held by THIS app's own stale backend (same role) — terminate it + wait for release, then bind. Strictly self-only: identifies the holder as our taskpaw-backend/backend_main.py of the matching role; a foreign service is left untouched and claim_port still fails loudly. - core/net.py: reclaim_port_from_stale_instance + _listener_pids/_is_our_backend (module-level optional psutil, mockable). - agent launcher: reclaim network + control ports (role=agent) before claim_port. - hub run_hub: reclaim API port (role=hub) before claim_port. - test_net_reclaim.py: fake-psutil tests incl. "never kill a foreign process". Design: docs/specs/2026-07-02-port-takeover-design.md Closes 159 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…out (Codex 外门); ad-hoc sign the macOS bundle - _is_our_backend matches taskpaw-backend* by prefix (incl. the target-triple sidecar the Tauri shell may launch), not an exact-name set. - reclaim's outer handler also catches psutil.TimeoutExpired so a stuck process that won't die after kill() can't abort startup (claim_port still fails loudly). - tauri.conf bundle.macOS.signingIdentity="-" → Tauri produces a consistent ad-hoc signature so locally-built DMGs aren't rejected as "damaged" on Apple Silicon (spctl was failing: "no resources but signature indicates they must be present"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t takeover semantics (Codex 外门 r2) - [P1] Revert the hardcoded tauri.conf signingIdentity="-" (it would override APPLE_SIGNING_IDENTITY and break Developer-ID signing + notarization in release.yml). Instead build.py adds ad-hoc signing to the --config override ONLY when no APPLE_SIGNING_IDENTITY is set → local/unsigned builds get a consistent ad-hoc signature (no more "damaged" on Apple Silicon), signed releases untouched. Verified: codesign --verify --deep --strict → "valid on disk / satisfies its Designated Requirement". - [P2] Document reclaim as intentional "last-launch-wins supersede" for a single-agent/hub-per-machine box; preventing accidental double-launch of the same version is the Tauri shell's single-instance job (follow-up), not this port logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
backend_main defaults to agent when launched with no role arg, so an agent must also reclaim a role-less taskpaw-backend; hub still requires an explicit "hub". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 外门 r4) The agent needs BOTH its network and control ports. Reclaiming them one-by-one could kill our own stale agent for one port and then still fail claim_port on the other if a FOREIGN service holds it — leaving no agent running. Add reclaim_ports_from_stale_instance(): inspect every required port first and abort the whole reclaim if any holder is foreign, so the old agent is only superseded when all its ports are free or ours. Factor out _terminate_backend (shared with the single-port hub path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odex 外门 r5) _listener_pids matched by port number alone, so a foreign 127.0.0.1:P listener would (falsely) block an agent configured for 192.168.x.y:P and abort the reclaim. Filter by address conflict (_addr_conflicts): same address, or a wildcard on either side, and only within the same IP family. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odex 外门 r6) - _addr_conflicts: a `localhost` bind (allowed by the Hub guard) is reported by psutil as a numeric 127.x, so treat any two loopback addresses as conflicting — a stale localhost-bound backend is now reclaimed. - _is_our_backend: match the full package path taskpaw_v3/packaging/backend_main.py (or the -m module) instead of a bare backend_main.py, so an unrelated project's script on the port is never mistaken for ours. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stale V3 process launched with the documented headless commands (deployment.md: python -m taskpaw_v3.agent / python -m taskpaw_v3.hub run) has process name `python` and no backend_main in its argv, so it was treated as foreign and the port wasn't reclaimed. Add _backend_role/_role_from_module: derive the role from the module name (agent|hub), covering the -m module string and its resolved path form, alongside the existing sidecar + packaging entrypoints. Doc updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CRITICAL: system-wide psutil.net_connections() needs root on macOS, so on the logged-in-user desktop app it raised AccessDenied and the whole takeover silently no-op'd — the exact update/restart failure this feature fixes. Flip the approach: enumerate OUR OWN processes (process_iter) and read each one's own sockets (_proc_listen_conns), which works for a same-user process without root. Foreign holders are no longer directly visible, so the agent's all-or-nothing multi-port reclaim classifies a port as foreign when it's occupied (not port_available) yet not ours. Handles the psutil 6 Process.connections→net_connections rename. Doc updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _backend_role: parse the role as an EXACT argv token ('agent'/'hub') via
_explicit_role instead of `"hub" in cmd`, so a path/flag that merely contains the
word (e.g. /Users/hubert/…) can't misclassify an agent backend as a hub and leave
the stale agent running.
- Tighten the sidecar name check from a loose startswith to a regex matching only the
base name or a base+target-triple (underscores allowed, e.g. x86_64), so a foreign
helper like taskpaw-backend-logger isn't treated as ours.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x (Kimi 终审 r2) - Catch the full psutil.Error hierarchy (incl. ZombieProcess) in identity/enumerate/ terminate paths — a zombie's name()/cmdline()/wait() no longer crashes startup; degrades to the documented no-op. - Stronger positive ID: the sidecar must match the name regex AND (when available) the real proc.exe() basename, so a foreign process can't pass by spoofing proc.name(). Match the packaging module only as an actual `-m <module>` pair, and the headless taskpaw_v3.agent|hub module only as a `-m` value or a package .py script path — not a bare arg. (Install-dir containment intentionally avoided: onefile runs from a _MEI temp path, which would miss our own stale backend.) - _addr_conflicts: check loopback-equivalence BEFORE the IPv4/IPv6 split so a stale ::1 backend is reclaimed for a `localhost` start. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…able (Codex 外门) If control_port == bind_port on the same/overlapping host (accepted by AgentConfig, editable in the UI), the all-or-nothing reclaim would kill our old working agent and then fail startup on the self-colliding second socket. Detect non-mutually-bindable required ports up front and reclaim nothing; claim_port fails loudly, old agent lives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(Kimi 终审 r2)
- _role_from_module: anchor the `-m` module to exactly taskpaw_v3.agent|hub or a
submodule prefix (and the script path to `/taskpaw_v3/{agent,hub}/`), so a foreign
`python -m my.taskpaw_v3.agent` on the port is no longer misidentified as ours.
- Duplicate required-port check: use a stricter _same_bind_target (wildcard / same
literal address / localhost↔canonical-loopback) instead of _addr_conflicts, so a
valid two-loopback config (127.0.0.1 + 127.0.0.2) isn't wrongly judged non-bindable
and skipped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…loopback (Kimi 终审) - Anchor the from-source backend_main match to a `/` boundary (or exact relative path) so `.../clonetaskpaw_v3/packaging/backend_main.py` isn't taken as ours. - _role_from_module: match path COMPONENTS (taskpaw_v3/agent|hub), not a substring, so `.../mytaskpaw_v3/agent/…` can't match. - _addr_conflicts: replace blanket "both loopback" with _loopback_equiv — localhost ≡ any loopback, 127.0.0.1 ≡ ::1, distinct numeric loopbacks (127.0.0.1 vs .2) do NOT collide. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #159
问题
每次装新版本启动都失败(「backend did not start within 30s / port in use」)——上一版本后端还在跑、占着 agent/hub 端口。
方案(你要的「启动时自己清除」)
启动时,若目标端口被本应用自己的旧后端占用,自动终止该陈旧实例并等端口释放,再绑定。agent 接管 network+control 端口,hub 接管 API 端口。
安全(关键)
绝不误杀端口上的外部进程:仅当占用者被明确识别为本应用同角色后端(名
taskpaw-backend[.exe]或源码backend_main.py,且 argv 含agent/hub角色)才终止;外部/异角色一律不动,claim_port对真实冲突仍 fail-loud(宪法 §3)。优雅终止(terminate→wait→kill 兜底),轮询等端口释放;psutil 缺失/权限不足/异常均降级为「什么都不做」,绝不崩溃;全程日志。位置
core/net.py:reclaim_port_from_stale_instance+_listener_pids/_is_our_backend(模块级可选 psutil,便于 mock)。agent/server/launcher.py、hub/server/app.py:claim 前调用。测试(
test_net_reclaim.py,伪 psutil,不杀真进程)接管同角色陈旧后端;外部进程(nginx)不动;异角色不动;源码
backend_main.py命中;无 psutil no-op;_is_our_backend名+角色匹配。uv run pytest465 passed;ruff/mypy 全绿。🤖 Generated with Claude Code