Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions docs/STRATEGY.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,23 +32,40 @@ Linux deserves a biometric authentication layer that is **reliable, secure, and

---

## Where We Are: v0.3.0
## Where We Are: v0.3.3

**Shipped 2026-02-23. All 6 implementation steps complete. End-to-end tested on Ubuntu 24.04.4 LTS.**
**Shipped 2026-05-28. Bug-fix release wave (v0.3.2 → v0.3.3) on top of the v0.3.0 foundation.**

v0.3.0 (2026-02-23) shipped all 6 implementation steps end-to-end on Ubuntu 24.04.4 LTS.
The v0.3.x point releases since then addressed two silent ship-time bugs and added
broader hardware + packaging coverage:

- **v0.3.2 (2026-05-28)** — fixed `PAM success=end → success=done` keyword (libpam was
silently treating the unknown keyword as `ignore` since v0.1.0, so face auth was a
silent no-op on the documented setup paths). Closed Issue #26 — `visaged` now handles
SIGTERM correctly, dropping the ~90s post-hibernate `systemctl restart` hang to ~10s.
- **v0.3.3 (2026-05-28)** — Lenovo X1 Carbon Gen 9 IR camera quirk (second
Tier-1-verified hardware target after ASUS Zenbook 14 UM3406HA); AUR `!lto !debug`
fix so `makepkg -si` succeeds on stock Arch; devshell parity with CI;
7 dependency bumps.

| Component | What it delivers |
|-----------|-----------------|
| `visage-hw` | V4L2 capture, GREY/YUYV/Y16 format detection, CLAHE preprocessing, dark frame rejection |
| `visage-hw` | V4L2 capture, GREY/YUYV/Y16 format detection, CLAHE preprocessing, dark frame rejection. Quirks DB covers ASUS Zenbook 14 + Lenovo X1 Carbon Gen 9 |
| `visage-core` | SCRFD face detection + ArcFace recognition via ONNX Runtime — CPU-capable, no CUDA required |
| `visaged` | Persistent daemon — holds camera and model weights across auth requests, D-Bus IPC, SQLite WAL |
| `pam-visage` | Thin PAM module — `PAM_IGNORE` fallback, never blocks, system bus |
| IR emitter | UVC extension unit control, hardware quirks database, ASUS Zenbook 14 UM3406HA confirmed |
| Packaging | `.deb` with `pam-auth-update`, systemd hardening, AES-256-GCM embeddings at rest |
| `visaged` | Persistent daemon — holds camera and model weights across auth requests, D-Bus IPC, SQLite WAL. SIGINT + SIGTERM shutdown handlers; `TimeoutStopSec=10s` defense in depth |
| `pam-visage` | Thin PAM module — `PAM_IGNORE` fallback, never blocks, system bus. `[success=done default=ignore]` control flow (corrected v0.3.2) |
| IR emitter | UVC extension unit control, hardware quirks database |
| Packaging | `.deb` with `pam-auth-update`, AUR `!lto !debug` PKGBUILD with verified `sha256sums`, NixOS module, systemd hardening, AES-256-GCM embeddings at rest |

Visage authenticates in ~1.4s on CPU with a USB webcam. Howdy's Python subprocess cold-start
is 2-3s. Visage is already faster — without IR camera or GPU — because model weights are
loaded once at daemon start, not per attempt. That is the architectural advantage.

See [ADR 012](decisions/012-post-launch-stabilization-v0.3.2-v0.3.3.md) for the full
v0.3.x stabilization context, the rationale behind each fix, and the trade-offs
accepted.

---

## Ecosystem Position
Expand Down
169 changes: 169 additions & 0 deletions docs/decisions/012-post-launch-stabilization-v0.3.2-v0.3.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# ADR 012 — Post-Launch Stabilization: v0.3.2 + v0.3.3 Bug-Fix Wave + Community PR Intake

**Date:** 2026-05-28
**Status:** Implemented (v0.3.2 + v0.3.3 shipped)
**Scope:** `visaged`, `pam-visage`, `packaging/aur`, `packaging/debian`, `packaging/nix`, `flake.nix`, docs, CHANGELOG

---

## Context

Three months after v0.3.0 shipped (2026-02-23), the visage PR/issue/discussion queue had accumulated:

- **7 open dependabot PRs** dating back to late March (3 GitHub Action major bumps + 4 Rust dep bumps + 1 failing `ort` rc bump)
- **3 community PRs** from external contributors: @themariusus's Lenovo X1 Carbon Gen 9 IR camera quirk (#29), @SelfRef's Arch README docs update (#27), @SomeCodecat's AUR `!lto !debug` fix (#25)
- **1 open issue** from @SomeCodecat (#26) — `visaged` blocks for 90s on `systemctl restart` after hibernate due to stale camera fd
- **1 open discussion** from @Alex52Github (#28) — when is Intel IPU6 / MIPI / libcamera support expected, and is Fedora supported?

Investigation while preparing to respond to PR #27 surfaced a **fleet-wide PAM bug shipped since v0.1.0**: `/etc/pam.d/*` files generated by the repo (README, NixOS module, Debian pam-auth-update profile, etc.) used the keyword `success=end` in 9 places. `pam.conf(5)` documents only `ignore | bad | die | ok | done | reset | N` as valid value-keywords. libpam logs a warning and silently treats the unknown keyword as `ignore`, dropping `pam_visage.so`'s `PAM_SUCCESS`. Result: every Visage install on the documented setup paths has had face auth as a no-op since v0.1.0 — face match succeeded, libpam ignored the result, stack fell through to `pam_unix.so` → password prompt.

Investigation of @SomeCodecat's hibernate-hang report (#26) also surfaced a second silent ship-time bug: `crates/visaged/src/main.rs` used `tokio::signal::ctrl_c().await?` for shutdown. On Unix, `tokio::signal::ctrl_c()` is SIGINT-only — it does NOT catch SIGTERM. systemd's `systemctl stop|restart` sends SIGTERM, which `visaged` was ignoring, so systemd waited the default `TimeoutStopSec=90s` and SIGKILL'd. Manifested as the 90s hang reported in #26 whenever `visage-resume.service` fired after hibernate.

## Decision

### 1. Two prioritized release cuts: v0.3.2 (bug fixes) → v0.3.3 (community + deps)

Cut **v0.3.2 first** to ship the two real bug fixes (PAM keyword + SIGTERM handler) to users on v0.3.0 as fast as possible. Cut **v0.3.3 second** to bundle the X1 Carbon hardware quirk, the AUR LTO fix, the devshell parity improvement, and the dependabot cohort. **Skip v0.3.1 entirely** — its number is permanently unused.

| Release | Tag | Date | Asset | Contents |
|---|---|---|---|---|
| Bug-fix release | `v0.3.2` | 2026-05-28 | `visage_0.3.2-1_amd64.deb` (9.46 MB) | PAM `end → done` fleet sweep + visaged SIGTERM handler + `TimeoutStopSec=10s` |
| Deps + community | `v0.3.3` | 2026-05-28 | `visage_0.3.3-1_amd64.deb` (9.45 MB) | X1 Carbon quirk + AUR `!lto !debug` + devshell parity + 7 dep bumps |

### 2. PAM keyword fleet sweep: `success=end → success=done` across 9 sites

Swept in PR #31 (squash-merged into v0.3.2). Affected files:

- `README.md` (Arch install instructions, 1 site)
- `docs/operations-guide.md` (setup + verification, 2 sites)
- `docs/architecture.md` (PAM stack integration narrative, 1 site)
- `docs/research/architecture-review-and-roadmap.md` (roadmap context, 2 sites)
- `docs/research/domain-audit.md` (implementation plan, 1 site)
- `docs/research/howdy-analysis-and-visage-design.md` (design comparison, 1 site)
- `packaging/debian/pam-auth-update` (Ubuntu profile — live config, 1 site)
- `packaging/nix/module.nix` (NixOS module — `sudo` + `login` rules, 2 sites)

**Reported by** @SelfRef in PR #27 (their PR caught 1 of 9 sites); commit `daa9903` credits them via `Reported-by:`. Their original PR (#27) is held open pending amend on cosmetic items unrelated to the keyword fix.

**Existing-user upgrade path:**
- **Debian/Ubuntu** — `postinst` runs `pam-auth-update --package visage` on every install, which regenerates `/etc/pam.d/common-auth` from our corrected profile. Auto-recovery on next `.deb` upgrade.
- **NixOS** — corrected `security.pam.services.{sudo,login}.rules.auth.visage.control` value picked up on next `nixos-rebuild switch`.
- **Arch (manual)** — operators who copied the prior README's example into `/etc/pam.d/system-auth` must manually swap `success=end` for `success=done`. CHANGELOG calls this out.

### 3. `visaged` SIGTERM handler + `TimeoutStopSec=10s` unit override

Swept in PR #30 (squash-merged into v0.3.2). Replaced the SIGINT-only `tokio::signal::ctrl_c().await?` with a dual-signal handler matching the pattern at `esver-capture/crates/esver-capture-cli/src/daemon.rs::wait_for_shutdown_signal`:

```rust
use tokio::signal::unix::{signal, SignalKind};
let mut sigterm =
signal(SignalKind::terminate()).context("failed to install SIGTERM handler")?;
let mut sigint =
signal(SignalKind::interrupt()).context("failed to install SIGINT handler")?;
tokio::select! {
_ = sigterm.recv() => tracing::info!(signal = "SIGTERM", "received shutdown signal"),
_ = sigint.recv() => tracing::info!(signal = "SIGINT", "received shutdown signal"),
}
```

Added `TimeoutStopSec=10s` to `packaging/systemd/visaged.service` as defense in depth — covers the edge case where a `v4l2 VIDIOC_DQBUF` is mid-flight on shutdown (e.g. a stale camera fd after hibernate resume that isn't promptly interruptible). Worst-case `systemctl restart` drops from ~90s to ~10s.

**Closes Issue #26.**

### 4. Devshell parity (`rustfmt` + `clippy` + `libclang`) in `flake.nix`

Swept in PR #32 (squash-merged into v0.3.3). The `nix develop` shell brought the package's build inputs via `inputsFrom = [ visage ]` but didn't include the cargo subcommands CI runs (`cargo fmt --check`, `cargo clippy --workspace -- -D warnings`) or `libclang.so` (transitively needed by `v4l2-sys-mit`'s `bindgen`). Devshell now declares:

```nix
packages = with pkgs; [
rustfmt
clippy
llvmPackages.libclang
rust-analyzer
cargo-deb
cargo-watch
];
LIBCLANG_PATH = "${pkgs.llvmPackages.libclang.lib}/lib";
```

Verified: `cargo fmt --all -- --check`, `cargo clippy --workspace -- -D warnings`, and `cargo build -p visaged` all run inside `nix develop` without further env tweaking.

### 5. Lenovo ThinkPad X1 Carbon Gen 9 IR camera quirk (`174f:2454`)

Merge-committed via PR #29 (per CONTRIBUTING.md "Hardware quirks: Merge commit"). Quirk file at `contrib/hw/174f-2454.toml`. Verified on hardware by @themariusus. Now embedded at compile time alongside the existing ASUS Zenbook 14 UM3406HA quirk.

### 6. AUR `PKGBUILD: options=(!lto !debug)`

Squash-merged via PR #25. Fixes the link-time `undefined symbol: ring_core_0_17_14__LIMBS_window5_split_window` (and many more from `ring` + `libsqlite3-sys`) failure on Arch's stock `makepkg.conf` (which defaults to `OPTIONS=(... lto ...)`). Root cause: LTO operates on LLVM IR, but `ring` ships hand-written assembly via `cc` and `libsqlite3-sys` (rusqlite's `bundled` feature) compiles `sqlite3.c` via `cc` — neither produces LTO-compatible IR. Reported and fixed by @SomeCodecat.

### 7. Close the v0.1.0-era `sha256sums=('SKIP')` TODO in `packaging/aur/PKGBUILD`

This ADR ships alongside the `SKIP` → real-hash fix. The PKGBUILD now declares the SHA-256 of the v0.3.3 source tarball at `github.com/sovren-software/visage/archive/refs/tags/v0.3.3.tar.gz`. `makepkg` will reject any tampered or corrupted download.

A comment block explains the bump procedure for future maintainers (compute via `curl ... | sha256sum`; must re-compute on every `pkgver` bump).

### 8. Dependency cohort

Merged: `tokio` 1.49→1.50 (#17), `nix` 0.31.1→0.31.2 (#18), `uuid` 1.21→1.23 (#19), `image` 0.25.9→0.25.10 (#23), `actions/checkout` 4→6 (#15), `actions/upload-artifact` 4→7 (#16), `actions/download-artifact` 4→8 (#14). All bumps passed CI on the visage workspace.

Closed: `ort` 2.0.0-rc.11 → 2.0.0-rc.12 (#20). CI failed on rc.12 — likely API drift in the `ort` 2.0.0-rc series. Will reattempt at rc.13+ or 2.0.0 final.

### 9. Documentation updates

- `README.md` Status line bumped `v0.3.0` → `v0.3.3` with a brief summary of the intervening fixes + dual-hardware support.
- `docs/STATUS.md` last-updated bumped to **2026-05-28**; build-state rewritten to reflect v0.3.3, post-v0.3.0 bug-fix wave, and quirks DB now covering ASUS Zenbook 14 UM3406HA + Lenovo X1 Carbon Gen 9 20XW00FPUS.
- `CHANGELOG.md` entries dated and structured under Keep-a-Changelog format (`[Unreleased]` rolled over each release cut).

## Trade-offs

### v0.3.1 numerical skip (D1)

**Trade-off accepted:** Anyone reading the release list sees a missing v0.3.1. Once v0.3.2 ships, that number is permanently dead — no path back.

**Benefit:** Users on v0.3.0 with face-auth silently broken get the fix in v0.3.2 within hours. The dep cohort + community PRs land in a clean v0.3.3 without commingling.

### PAM sweep separate from PR #27 (D2)

**Trade-off accepted:** @SelfRef's contribution looks "partial" until they respond to the amend request — their PR currently sits at "Changes requested" while the actual fleet-sweep fix shipped in #31. Mitigated by:
1. Crediting `Reported-by: @SelfRef` in commit `daa9903`.
2. Posting a follow-up clarification comment on PR #27 explicitly acknowledging the catch was theirs and that #31 carries the full fix.
3. Offering @SelfRef inclusion in `CODEOWNERS` for `packaging/aur/` (they already maintain `visage` / `visage-git` / `visage-bin` on AUR — they're our de facto AUR maintainer).

**Benefit:** The PAM bug is fixed atomically across all 9 sites — no Debian/NixOS user is missed.

### `TimeoutStopSec=10s` as defense in depth (not primary fix) (D3)

**Trade-off accepted:** The SIGTERM handler is the primary fix; the unit timeout is a backstop. If a `v4l2 VIDIOC_DQBUF` is genuinely stuck (e.g. driver bug on stale fd), the synchronous capture inside `tokio::task::spawn_blocking` still needs the full 10s before systemd escalates to SIGKILL.

**Benefit:** Worst-case operational `systemctl restart` is bounded — 90s is no longer a possibility. 10s gives operational headroom; could be tightened to 5s if measurement supports it.

### `sha256sums=` real hash without changing source URL (this ADR)

**Trade-off accepted:** Source URL is still `https://github.com/sovren-software/visage/archive/refs/tags/v$pkgver.tar.gz` (GitHub's git-archive endpoint). GitHub has historically (in 2023) changed git-archive compression behavior, breaking many projects' AUR PKGBUILDs that had pinned hashes. If that happens again, our hash will mismatch and AUR users will see a `makepkg` integrity error.

**Benefit:** Closes the v0.1.0-era TODO without introducing a release-asset tarball generator. AUR integrity check is now active for the v0.3.3 tarball. The bump-procedure comment in the PKGBUILD documents how to re-verify on future bumps.

**Alternative not chosen:** Add a tarball generation step to `.github/workflows/ci.yml`'s `release` job (produce a deterministic `.tar.gz` asset on each release tag, point the PKGBUILD source URL at that asset). More work; defer to v0.4 packaging arc if GitHub changes git-archive again.

## Drawbacks / Known Limitations

1. **`sha256sums` requires manual bump every release.** The comment in the PKGBUILD documents the procedure. If a future release cut forgets to update the hash, `makepkg` will fail with an integrity mismatch — operationally noisy but not silent (which is preferable to the prior `SKIP` state).
2. **PR #27 still open.** @SelfRef's PR carries non-PAM improvements (visage-resume enable, `visage verify` step, AUR variant documentation) that have not yet landed. Wait clock until 2026-06-04; close-and-redo as our own PR with `Reported-by:` credit if they don't amend.
3. **Discussion #28 answer drafted but not posted.** Operator constraint (fleet PAT lacks `discussions:write`). Drafted answer to @Alex52Github covers IPU6 path (v0.5 arc, depends on a libcamera backend behind `visage-hw`'s `Camera` trait) and Fedora packaging gap (no fundamental blocker — needs RPM `.spec` and pam-auth-update equivalent).
4. **Issue #33 dependabot security alerts (14 open, severity 6h/3m/5l) not triaged.** Operator constraint (PAT lacks `security_events`). Tracked at `https://github.com/sovren-software/visage/security/dependabot` for triage via the browser UI. Some may be advisory-database flags that don't reach our code path; others may require dependency pins. Gate v0.3.4 if any severity-high are reachable in `visaged` or `pam-visage`.
5. **Bypass-merge precedent on release PRs.** Single-maintainer repos with branch protection requiring 1 approval need the bypass for release PRs (operator is the only writer AND the PR author; GitHub explicitly forbids self-approval). Scope discipline: release PRs only — never bypass for code-change PRs. Documented in commit messages and CONTRIBUTING.md remains unchanged.

## Companion documents

- Engram session ADR: `~/cDesign/dendrite/Projects/Visage/Decisions/SESSION-2026-05-28-VISAGE-V0.3.2-V0.3.3-BUG-FIX-WAVE-AND-COMMUNITY-PR-INTAKE-ADR.md` — full per-decision rationale + remaining-work tracking for the org-internal audience.
- Engram dev-log: `~/cDesign/dendrite/Projects/Visage/dev-log.md` — session entry summarizing the cohort + key discoveries.

## Remaining work for the v0.3.x post-launch stabilization arc

| Item | Gate | Owner |
|---|---|---|
| PR #27 amend or close-and-redo | by 2026-06-04 | @SelfRef → maintainer fallback |
| Discussion #28 answer posted | operator UI paste OR PAT `discussions:write` | Operator |
| Issue #33 dependabot security alerts triaged | operator UI access OR PAT `security_events` | Operator |
| CODEOWNERS for `packaging/aur/` | @SelfRef accepts offer in PR #27 thread | @SelfRef |
| v0.4.0 packaging arc | scope decision | Maintainer |
7 changes: 5 additions & 2 deletions packaging/aur/PKGBUILD
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,11 @@ install="$pkgname.install"
# Preserve user data and face database across upgrades
backup=('var/lib/visage/faces.db')
source=("$pkgname-$pkgver.tar.gz::https://github.com/sovren-software/visage/archive/refs/tags/v$pkgver.tar.gz")
# TODO: compute real sha256sum at release time: sha256sum visage-0.3.3.tar.gz
sha256sums=('SKIP')
# sha256 of the v$pkgver tarball at github.com/sovren-software/visage/archive/refs/tags/v$pkgver.tar.gz
# Compute via:
# curl -fsSL https://github.com/sovren-software/visage/archive/refs/tags/v$pkgver.tar.gz | sha256sum
# Must be re-computed on every pkgver bump.
sha256sums=('e018fcc08dbb3aba381306424fc1fd94eaddc0a5da0d47437f17487f29b76f99')

build() {
cd "$pkgname-$pkgver"
Expand Down