Skip to content

dancinlab/pool

Repository files navigation

pool

⚠️ ARCHIVED — 2026-05-30. The pool CLI is now sidecar-owned. The source (bin/pool.hexa) was relocated faithfully into dancinlab/sidecar at skills/pool/bin/pool.hexa; sidecar's install.hexa builds the pool binary (~/.hx/bin/pool.bin) and installs the shim. The canonical entry is now sidecar pool <args>; the bare pool on PATH is a kept-working deprecation shim. The roster SSOT (~/.pool/pool.json) and the pool-route hook are unchanged.

Do NOT hx install pool from this repo anymore — install/sync via sidecar (sidecar sync). This repo is retained read-only for history; no further releases ship from here.

Minimal host roster + remote exec + one-shot bootstrap. Single hexa file, zero deps, state at ~/.pool/pool.json.

Install

# 1. Install hexa-lang (gives you `hexa` + `hx` package manager)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/dancinlab/hexa-lang/main/install.sh)"

# 2. Install pool
hx install pool

hx wires the pool shim into ~/.hx/bin/ (must be on PATH).

Verbs

pool add <name> <user@host> [--sudo]   # register a host (ssh-probes `uname -s` for OS)
pool rm  <name>                         # remove from roster
pool list [--json] [--fast]             # roster + live cpu/ram/gpu/disk per host (--fast skips probe; --json embeds usage)
pool on  <name> <cmd...>                # ssh + exec on host
pool on  all <cmd...> [--jobs N] [--timeout S]  # parallel fanout (every host except pi5-akida; generic, zero domain knowledge)
pool status                             # live reachability probe (each host)
pool refresh                            # re-probe os for every host
pool init                               # bootstrap every host with all features (skips pi5-akida)
pool clean <name>|all [--dry-run] [--deep]  # run disk-cleanup now — safe tier; --deep adds model caches + docker
pool desc <name> [text...]              # set a host description (shown in `pool list`)

pool init features

pool init runs every bootstrap feature on every host in order (skipping pi5-akida — anima-only, @D s8). Adding a feature is one labels.push(...) / scripts.push(...) pair in cmd_init inside bin/pool.hexa.

feature linux macos
tailscale (headless) curl tailscale.com/install.sh | sudo sh brew install tailscale + brew services start tailscale (CLI, not the GUI cask — avoids menubar conflicts)
/etc/cron.daily/disk-cleanup the two-tier pool clean script, logged to /var/log/pool-cleanup.log. Safe tier daily (journalctl vacuum · apt autoremove+clean · snap disabled-revs · /var/{log,crash,tmp} · /tmp · ~/.cache/{uv,pip,puppeteer,mathlib,thumbnails}); deep tier when disk ≥ 85% (~/.cache/{huggingface,torch,npm,…} · docker system prune). Uniform 3-day mtime retention. skipped (newsyslog + systemd-tmpfiles equivalents builtin)
no-sleep skipped (Linux servers don't display-sleep) pmset -a displaysleep=0 sleep=0 disksleep=0 + screensaver idleTime=0 — FileVault Macs sever sshd-reachable interfaces on display lock, breaking long-running headless builds
persistent-journal mkdir /var/log/journal + restart journald (post-mortem logs survive reboot) skipped (macOS uses unified log, persistent by default)
panic-recovery kernel.panic=10 + kernel.panic_on_oom=0 via /etc/sysctl.d/60-pool-recovery.conf — kernel auto-reboots 10s after panic instead of hanging skipped (Darwin kernel auto-restarts on panic)
earlyoom apt install earlyoom + enable — userspace OOM-killer fires at <10% free RAM+swap, converting kernel hangs into clean per-process SIGKILL skipped (macOS has Jetsam built in)
swapfile 32 GB /swapfile-pool + /etc/fstab entry + vm.swappiness=10 — only if existing swap < 30 GB skipped (macOS manages dynamic swap automatically)
watchdog softdog + systemd RuntimeWatchdogSec=30s (auto-reset on kernel hang) skipped (no equivalent userspace API)

tailscale up still needs interactive auth — pool init only installs.

The OOM-resilience block (persistent-journal · panic-recovery · earlyoom · swapfile · watchdog) was added after a host-OOM-on-concurrent-heavy-workload incident: a 17 GB ckpt eval + a 17 GB HF upload dispatched to the same 30 GB host in parallel saturated RAM, the kernel OOM-killer hung, and the host needed a physical reboot. Each feature is one independent line of defense — together they convert that failure mode into "earlyoom kills the lower-priority job; if it still escalates, swap absorbs the spike; if the kernel still hangs, the watchdog auto-resets within 30 s; persistent journal preserves the evidence; panic-recovery reboots in 10 s instead of forever." The structural dispatcher fix (a pool reservation/commitment ledger) is a separate roadmap item.

pool clean

pool clean <name>|all runs disk-cleanup immediately — no waiting for the daily cron — and reports freed space (df before/after):

tier when targets
safe always journalctl vacuum · apt-get autoremove/clean · snap disabled revisions · /var/{log,crash,tmp} · /tmp · ~/.cache/{uv,pip,puppeteer,mathlib,thumbnails}
deep --deep, or disk ≥ 85% model caches ~/.cache/{huggingface,torch,torch_extensions,npm,yarn,go-build} · docker system prune

--dry-run counts what each tier would remove and deletes nothing. Uniform 3-day mtime retention — atime is unreliable on noatime / relatime mounts.

pool clean all skips pi5-akida automatically (anima-only · @D s8). Explicit pool clean pi5-akida still runs against it.

State

~/.pool/pool.json:

{
  "hosts": [
    {"name": "ubu-1",     "ssh": "ubu-1",                   "sudo": true, "os": "linux"},
    {"name": "ubu-2",     "ssh": "ubu-2",                   "sudo": true, "os": "linux"},
    {"name": "mini",      "ssh": "mini",                    "sudo": true, "os": "macos"},
    {"name": "pi5-akida", "ssh": "ubuntu@192.168.50.155",   "sudo": true, "os": "linux",
     "description": "anima — dedicated Akida neuromorphic host"}
  ]
}

Override path with POOL_STATE=<path>. Optional per-host keys:

key default effect
description free text · shown in pool list

pi5-akida is hard-excluded from every fan-out / route / health / init / clean-all path by name-substring match (anima-only · @D s8). Explicit pool on pi5-akida <cmd> / pool clean pi5-akida still work.

macOS pool hosts (os: macos · e.g. mini) only receive auto-dispatched macOS-only commands (swift / xcodebuild / codesign / …) per @D s12 — they never join the OS-agnostic general-heavy round-robin (Mac stays the workstation). Explicit pool on mini <cmd> still works for any command.

pool list

pool list probes every host concurrently (8-wide async fan-out, ~8s probe timeout) and renders load · ram · gpu · disk inline:

                            load  ram    gpu  disk      sudo  desc
mini       mini    macos    0.42  8/16G  -    100/512G  sudo
ubu-1      ubu-1   linux    0.01  3/30G  0%   546/915G  sudo
ubu-2      ubu-2   linux    4.02  4/30G  0%   721/915G  sudo  shared CI runner
pi5-akida  ubuntu@192…      -     -      -    -         sudo  anima — dedicated Akida host

--fast skips the probe for a quick roster-only view (no ssh). pool list --json embeds the probe result as a per-host usage object:

{"name": "ubu-1", "ssh": "ubu-1", "os": "linux",
 "usage": {"load": "0.01", "ram": "3/30G", "gpu": "0%", "disk": "546/915G"}}

Locale-independent (LC_ALL=C forced on remote). Linux free -g parsed by row/column position. macOS uses vm_stat + sysctl hw.memsize. GPU = nvidia-smi utilization when present, - otherwise.

Roadmap

See TODO.md (dispatcher-side reservation ledger · future verbs).

License

MIT.

About

minimal host roster + remote exec (6 verbs · hx install pool)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors