⚠️ ARCHIVED — 2026-05-30. ThepoolCLI is now sidecar-owned. The source (bin/pool.hexa) was relocated faithfully intodancinlab/sidecaratskills/pool/bin/pool.hexa; sidecar'sinstall.hexabuilds thepoolbinary (~/.hx/bin/pool.bin) and installs the shim. The canonical entry is nowsidecar pool <args>; the barepoolon PATH is a kept-working deprecation shim. The roster SSOT (~/.pool/pool.json) and thepool-routehook are unchanged.Do NOT
hx install poolfrom this repo anymore — install/sync via sidecar (sidecar sync). This repo is retained read-only for history; no further releases ship from here.
Minimal host roster + remote exec + one-shot bootstrap. Single hexa file, zero deps, state at ~/.pool/pool.json.
# 1. Install hexa-lang (gives you `hexa` + `hx` package manager)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/dancinlab/hexa-lang/main/install.sh)"
# 2. Install pool
hx install poolhx wires the pool shim into ~/.hx/bin/ (must be on PATH).
pool add <name> <user@host> [--sudo] # register a host (ssh-probes `uname -s` for OS)
pool rm <name> # remove from roster
pool list [--json] [--fast] # roster + live cpu/ram/gpu/disk per host (--fast skips probe; --json embeds usage)
pool on <name> <cmd...> # ssh + exec on host
pool on all <cmd...> [--jobs N] [--timeout S] # parallel fanout (every host except pi5-akida; generic, zero domain knowledge)
pool status # live reachability probe (each host)
pool refresh # re-probe os for every host
pool init # bootstrap every host with all features (skips pi5-akida)
pool clean <name>|all [--dry-run] [--deep] # run disk-cleanup now — safe tier; --deep adds model caches + docker
pool desc <name> [text...] # set a host description (shown in `pool list`)pool init runs every bootstrap feature on every host in order (skipping pi5-akida — anima-only, @D s8). Adding a feature is one labels.push(...) / scripts.push(...) pair in cmd_init inside bin/pool.hexa.
| feature | linux | macos |
|---|---|---|
| tailscale (headless) | curl tailscale.com/install.sh | sudo sh |
brew install tailscale + brew services start tailscale (CLI, not the GUI cask — avoids menubar conflicts) |
/etc/cron.daily/disk-cleanup |
the two-tier pool clean script, logged to /var/log/pool-cleanup.log. Safe tier daily (journalctl vacuum · apt autoremove+clean · snap disabled-revs · /var/{log,crash,tmp} · /tmp · ~/.cache/{uv,pip,puppeteer,mathlib,thumbnails}); deep tier when disk ≥ 85% (~/.cache/{huggingface,torch,npm,…} · docker system prune). Uniform 3-day mtime retention. |
skipped (newsyslog + systemd-tmpfiles equivalents builtin) |
| no-sleep | skipped (Linux servers don't display-sleep) | pmset -a displaysleep=0 sleep=0 disksleep=0 + screensaver idleTime=0 — FileVault Macs sever sshd-reachable interfaces on display lock, breaking long-running headless builds |
| persistent-journal | mkdir /var/log/journal + restart journald (post-mortem logs survive reboot) |
skipped (macOS uses unified log, persistent by default) |
| panic-recovery | kernel.panic=10 + kernel.panic_on_oom=0 via /etc/sysctl.d/60-pool-recovery.conf — kernel auto-reboots 10s after panic instead of hanging |
skipped (Darwin kernel auto-restarts on panic) |
| earlyoom | apt install earlyoom + enable — userspace OOM-killer fires at <10% free RAM+swap, converting kernel hangs into clean per-process SIGKILL |
skipped (macOS has Jetsam built in) |
| swapfile | 32 GB /swapfile-pool + /etc/fstab entry + vm.swappiness=10 — only if existing swap < 30 GB |
skipped (macOS manages dynamic swap automatically) |
| watchdog | softdog + systemd RuntimeWatchdogSec=30s (auto-reset on kernel hang) |
skipped (no equivalent userspace API) |
tailscale up still needs interactive auth — pool init only installs.
The OOM-resilience block (persistent-journal · panic-recovery · earlyoom · swapfile · watchdog) was added after a host-OOM-on-concurrent-heavy-workload incident: a 17 GB ckpt eval + a 17 GB HF upload dispatched to the same 30 GB host in parallel saturated RAM, the kernel OOM-killer hung, and the host needed a physical reboot. Each feature is one independent line of defense — together they convert that failure mode into "earlyoom kills the lower-priority job; if it still escalates, swap absorbs the spike; if the kernel still hangs, the watchdog auto-resets within 30 s; persistent journal preserves the evidence; panic-recovery reboots in 10 s instead of forever." The structural dispatcher fix (a pool reservation/commitment ledger) is a separate roadmap item.
pool clean <name>|all runs disk-cleanup immediately — no waiting for the daily cron — and reports freed space (df before/after):
| tier | when | targets |
|---|---|---|
| safe | always | journalctl vacuum · apt-get autoremove/clean · snap disabled revisions · /var/{log,crash,tmp} · /tmp · ~/.cache/{uv,pip,puppeteer,mathlib,thumbnails} |
| deep | --deep, or disk ≥ 85% |
model caches ~/.cache/{huggingface,torch,torch_extensions,npm,yarn,go-build} · docker system prune |
--dry-run counts what each tier would remove and deletes nothing. Uniform 3-day mtime retention — atime is unreliable on noatime / relatime mounts.
pool clean all skips pi5-akida automatically (anima-only · @D s8). Explicit pool clean pi5-akida still runs against it.
~/.pool/pool.json:
{
"hosts": [
{"name": "ubu-1", "ssh": "ubu-1", "sudo": true, "os": "linux"},
{"name": "ubu-2", "ssh": "ubu-2", "sudo": true, "os": "linux"},
{"name": "mini", "ssh": "mini", "sudo": true, "os": "macos"},
{"name": "pi5-akida", "ssh": "ubuntu@192.168.50.155", "sudo": true, "os": "linux",
"description": "anima — dedicated Akida neuromorphic host"}
]
}Override path with POOL_STATE=<path>. Optional per-host keys:
| key | default | effect |
|---|---|---|
description |
— | free text · shown in pool list |
pi5-akida is hard-excluded from every fan-out / route / health / init / clean-all path by name-substring match (anima-only · @D s8). Explicit pool on pi5-akida <cmd> / pool clean pi5-akida still work.
macOS pool hosts (os: macos · e.g. mini) only receive auto-dispatched macOS-only commands (swift / xcodebuild / codesign / …) per @D s12 — they never join the OS-agnostic general-heavy round-robin (Mac stays the workstation). Explicit pool on mini <cmd> still works for any command.
pool list probes every host concurrently (8-wide async fan-out, ~8s probe timeout) and renders load · ram · gpu · disk inline:
load ram gpu disk sudo desc
mini mini macos 0.42 8/16G - 100/512G sudo
ubu-1 ubu-1 linux 0.01 3/30G 0% 546/915G sudo
ubu-2 ubu-2 linux 4.02 4/30G 0% 721/915G sudo shared CI runner
pi5-akida ubuntu@192… - - - - sudo anima — dedicated Akida host
--fast skips the probe for a quick roster-only view (no ssh). pool list --json embeds the probe result as a per-host usage object:
{"name": "ubu-1", "ssh": "ubu-1", "os": "linux",
"usage": {"load": "0.01", "ram": "3/30G", "gpu": "0%", "disk": "546/915G"}}Locale-independent (LC_ALL=C forced on remote). Linux free -g parsed by row/column position. macOS uses vm_stat + sysctl hw.memsize. GPU = nvidia-smi utilization when present, - otherwise.
See TODO.md (dispatcher-side reservation ledger · future verbs).
MIT.