fix(engine): scale Chrome memory budget and frame cache to system RAM#1073
Conversation
On low-memory systems (<4GB free), Chrome's --force-gpu-mem-available-mb=4096
causes the renderer to allocate more GPU texture memory than the system can
provide, leading to OOM crashes during frame capture ("Target closed").
Changes:
- Scale --force-gpu-mem-available-mb to match available system RAM
(512MB when <2GB free, 1024MB when <4GB, 4096MB otherwise)
- Add --js-flags=--max-old-space-size=N on low-memory systems to cap
Chrome's V8 heap (256MB when <2GB free, 512MB when <4GB)
- Scale frame data URI cache defaults: 32 entries/128MB when <2GB free,
64 entries/256MB when <4GB, unchanged otherwise
Closes #1072
jrusso1020
left a comment
There was a problem hiding this comment.
Verdict: COMMENT (would-be APPROVE pending one parity consideration)
Reasonable root-cause fix — Chrome's hardcoded --force-gpu-mem-available-mb=4096 over-promises GPU budget on memory-constrained systems, causing OOM kills surfacing as "Target closed." The tiered scaling shape is sound; high-memory path is preserved. Holding for James's stamp greenlight after the freemem() vs totalmem() parity question is resolved.
Scaling formulas — edge cases verified
| Free RAM | GPU budget | V8 heap cap | Data URI cache |
|---|---|---|---|
| < 2 GB (e.g. 1.8 GB reporter) | 512 MB | 256 MB | 32 entries / 128 MB |
| < 4 GB | 1024 MB | 512 MB | 64 entries / 256 MB |
| >= 4 GB (CI/render-pod) | 4096 MB (unchanged) | none | 256 / 1500 MB (unchanged) |
- Reporter's case (1.8 GB free): GPU=512 + V8=256 + cache=128 = ~896 MB Chrome ceiling. Comfortably fits in 1.8 GB without OOM. ✓
- Very-low-memory (512 MB): still falls in
<2GBtier; gets the same 512/256/128 budgets. Total ~896 MB, exceeds 512 MB available — render would still OOM. But that's "below the fix's effective range," not a regression — no amount of scaling helps a 512 MB system. The fix doesn't claim to support that. ✓ - High-memory happy path (>= 4 GB):
getGpuMemBudgetMbreturns 4096 (identical to old hardcoded value),getLowMemoryFlagsreturns[](no new flags),memoryAdaptiveCacheLimit/BytesMbreturn the existing DEFAULT_CONFIG values. Zero behavior change for the CI/render-pod case. ✓
Magi's "data URI cache is insurance" claim verified
Looking at videoFrameInjector.ts:18-21: VideoFrameInjectorOptions includes an optional frameSrcResolver: (framePath: string) => string | null callback. When provided (via createCompiledFrameSrcResolver from the producer), it returns HTTP URLs and the data URI cache is bypassed. When absent, the cache is the fallback path.
So frameDataUriCacheLimit / frameDataUriCacheBytesLimitMb only constrain the fallback path. Reducing their defaults on low-memory systems is genuinely insurance, not silently functional. ✓
Tests — no pinned-value regressions
Grepped packages/engine/src/ for tests pinning 4096, force-gpu-mem-available, frameDataUriCacheLimit, frameDataUriCacheBytesLimitMb. Existing tests:
config.test.ts:135-139tests theMath.max(32, ...)clamp onframeDataUriCacheLimit— clamp behavior unchanged, test still passes.videoFrameInjector.test.ts:131usesDEFAULT_CONFIG.frameDataUriCacheBytesLimitMb * 1024 * 1024— DEFAULT_CONFIG values are unchanged at module-level (only theenvNumfallback is now memory-adaptive), so this test continues to use the original 1500 MB. ✓
No tests need updating. ✓
Substantive concern — freemem() vs totalmem() parity with existing codebase
The PR uses freemem() for memory-based decisions in two helpers (getSystemFreeMb() in config.ts:213, getFreeMb() in browserManager.ts:480). But packages/engine/src/services/parallelCoordinator.ts:125-128 has an explicit, documented preference for totalmem():
// Use total memory instead of free memory — macOS reports misleadingly low
// freemem() because it aggressively caches files in "inactive" memory that
// is immediately reclaimable.
const totalMemoryMB = Math.round(totalmem() / (1024 * 1024));This PR re-introduces the exact gotcha that comment was written to prevent. A user with a 16 GB Mac, 14 GB used by file caches (instantly reclaimable on demand), would have freemem() reporting ~800 MB free. Per this PR's thresholds, they'd trip the <2GB tier and get GPU=512 / V8=256 / cache=32/128 — a significant regression for high-spec macOS systems where the OOM bug doesn't actually apply.
Three resolution options:
- Use
totalmem()with a fraction (matchescalculateOptimalWorkers's pattern):const memBasis = Math.floor(totalmem() / (1024*1024) * 0.5)— treat half of total as the "available" basis. Matches the parallelCoordinator'stotalMemoryMB * 0.5shape. - Use
freemem() + bufferCache: on Linux,/proc/meminfo'sMemAvailableis the right metric (Node doesn't expose it directly butfreemem() + cachedcan be approximated). More accurate thanfreemem()alone, but platform-specific. - Keep
freemem()+ raise the thresholds to account for the underreport — but this is a brittle calibration that varies per OS.
Option 1 (use totalmem) is the cleanest fix and aligns with the existing convention. The reporter's case (1.8 GB free / 8 GB total) would still trip the <4GB tier with totalmem * 0.5 = 4 GB, which gives GPU=1024 / V8=512 / cache=64/256 — still enough headroom for their OOM-prone setup, but no false low-memory tier on high-RAM Macs.
Worth at minimum a code comment in the new helpers explaining why freemem() is chosen here when the rest of the codebase uses totalmem(). If the choice is deliberate (e.g., "Linux reporter's bug doesn't manifest on macOS where caches are reclaimable, so freemem is more accurate for the actual bug"), that's a defensible justification that should be documented inline.
Non-blocking observations
-
getSystemFreeMb()in config.ts usesrequire("os")inside try/catch;getFreeMb()in browserManager.ts usesfreememfrom the static import. Inconsistent. The dynamicrequireis presumably to handle non-Node contexts (browser bundle), but the rest ofconfig.tsuses other Node APIs (envNumcallsprocess.env) which would also fail there. Either both should be dynamic or both static. -
No unit tests for the scaling decision. Mocking
os.freemem()(oros.totalmem()if you go with option 1) is straightforward — assertions over the three tiers would lock in the formula. Optional given the scope, but amemoryAdaptiveCacheLimit() === 32 when freemem < 2GBtest is a 4-line addition. -
Scaling decision made once at startup. If the user frees up memory mid-render (or another process exits), the budgets don't recalculate. Probably fine — renders are short-lived — but worth a one-line code comment for the next reader.
Sanity checks
- CI: 11 success / 13 in_progress / 0 failures so far.
- No regression on >= 4 GB systems: confirmed by the three
getXxx() >= 4GBbranches all returning the existing default values. - PR description correctly scopes the fix: closes #1072, single external reporter, real-world but bounded impact. ✓
Summary
Ship-safe for the load-bearing case (Linux/8GB/1.8GB-free reporter). One important consideration: the freemem() vs totalmem() inconsistency with parallelCoordinator.ts could regress high-spec macOS systems. Either switch to totalmem() * 0.5 for consistency, OR document why freemem() is the right metric for THIS decision specifically.
— Rames Jusso
Addresses review feedback: freemem() is misleading on macOS where aggressive file caching reports low free memory even on high-spec machines. Switched to totalmem()-based thresholds consistent with calculateOptimalWorkers in parallelCoordinator.ts. Thresholds now based on total RAM: - <4GB total: GPU=512MB, V8=256MB, cache=32/128MB - <8GB total: GPU=1024MB, V8=512MB, cache=64/256MB - >=8GB total: unchanged (4096MB GPU, no V8 cap, 256/1500MB cache)
jrusso1020
left a comment
There was a problem hiding this comment.
Re-approved at 02372697 — totalmem() parity addressed
Verified all four helpers switched from freemem() to totalmem():
getSystemTotalMb()(config.ts) ✓memoryAdaptiveCacheLimit()/memoryAdaptiveCacheBytesMb()✓getTotalMemMb()(browserManager.ts) ✓getGpuMemBudgetMb()/getLowMemoryFlags()✓
Thresholds doubled (2GB→4GB and 4GB→8GB) which is the right approximation of the freemem ≈ totalmem / 2 rule of thumb. The fallback default also updated from 8192 → 16384 MB to stay consistent with the new tier semantics (default tier = "safely high-memory").
Tier walk-through
| Total RAM | GPU | V8 cap | Cache | Notes |
|---|---|---|---|---|
| 4 GB Raspberry Pi | 512 MB | 256 MB | 32 / 128 MB | Low tier — exactly the OOM-prone case |
| 8 GB reporter (≈7,800 MB reported) | 1024 MB | 512 MB | 64 / 256 MB | Mid tier — meaningful improvement vs old hardcoded 4096 GPU |
| 16 GB workstation | 4096 MB | no cap | 256 / 1500 MB | Default tier — unchanged from pre-PR |
| 16 GB Mac, 14 GB in caches | 4096 MB | no cap | 256 / 1500 MB | Default tier — the regression case my prior review flagged, now correctly resolved ✓ |
Reporter's 8GB system goes from "OOM-prone with 4GB GPU budget" to "mid tier with 1GB GPU budget + 512MB V8 cap." Should comfortably fit even with 1.8GB free at render-time. ✓
High-spec macOS regression I was concerned about (free-RAM-underreport tripping low-mem tiers): resolved by switching to totalmem() which is unaffected by file caches. ✓
CI
10 success / 13 in_progress / 0 failures. Clean.
Ship-ready on merit.
— Rames Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
Good fix for the right root cause — scaling GPU budget to actual available RAM is exactly the right approach. The browserManager.ts side is clean and the step-function thresholds (512/1024/4096 MB at the 2/4 GB RAM boundary) are sensible for the typical system profiles. The data URI cache scaling is a reasonable follow-up defense.
One blocker prevents this from working as intended in the primary consumer path, plus a few important gaps.
Blockers
config.ts:213 — require("os") silently fails in ESM, always returning the 8192 fallback
packages/engine is an ESM package ("type": "module", tsconfig.module: "ESNext"). In native ESM, require is undefined. The try/catch in getSystemFreeMb() swallows the ReferenceError and returns 8192 on every call — meaning both memoryAdaptiveCacheLimit() and memoryAdaptiveCacheBytesMb() always return the high-memory defaults regardless of actual free RAM. Half the fix is dead code on every machine.
// packages/engine/src/config.ts:210-216 — BROKEN in ESM
function getSystemFreeMb(): number {
try {
const { freemem } = require("os") as typeof import("os"); // ReferenceError → catch
return Math.floor(freemem() / (1024 * 1024));
} catch {
return 8192; // always returned
}
}Compare with browserManager.ts which correctly uses a top-level import:
import { freemem, homedir } from "os";The CLI bundle happens to inject a createRequire shim via tsup.config.ts banner, which means the require() call does work in the bundled dist/cli.js output. But anyone consuming @hyperframes/engine directly (the engine is marked noExternal and merged into the CLI bundle, but the engine package itself builds with plain tsc) hits this bug. The inconsistency is also a maintenance hazard — the next person who reads this module won't know the fix only works via the CLI bundle and not in the published package.
Fix: hoist freemem as a named import like browserManager.ts does:
import { freemem } from "os";
function getSystemFreeMb(): number {
return Math.floor(freemem() / (1024 * 1024));
}The catch-with-fallback pattern is defensive but unnecessary for a Node built-in — os will never throw. If you want a fallback for exotic environments, do it explicitly rather than via a silently-swallowed ReferenceError.
Important
No tests for the new scaling functions
getGpuMemBudgetMb(), getLowMemoryFlags(), memoryAdaptiveCacheLimit(), and memoryAdaptiveCacheBytesMb() are all pure-ish functions that are trivially testable with vi.mock("os", () => ({ freemem: () => N * 1024 * 1024 })). The threshold boundaries (2048/4096 MB) are exactly the kind of place where off-by-one or wrong-tier bugs hide. The existing test files (browserManager.test.ts, config.test.ts) are the natural homes.
Test cases worth pinning: freemem = 1.5 GB → GPU budget 512, heap 256, cache 32/128, freemem = 3 GB → GPU budget 1024, heap 512, cache 64/256, freemem = 8 GB → GPU budget 4096, no heap cap, cache defaults.
os.freemem() is host memory, not cgroup memory
On Docker/k8s containers with a memory limit set, os.freemem() reports the host's free memory, not the container's. A container with a 2 GB memory limit running on a 64 GB host reports ~62 GB free. The GPU budget and V8 cap will be set to the high-memory defaults and the OOM behavior in constrained containers is unchanged.
This is in-scope for the issue being fixed (bare-metal Lubuntu), but the PR description says "scales Chrome's GPU budget to available system RAM" which is misleading for container deployments. Either add a note to the PR description / code comment, or check /proc/self/cgroup + /sys/fs/cgroup/memory.limit_in_bytes on Linux to pick up the cgroup limit when it's lower than host free RAM.
freemem() is sampled once per buildChromeArgs() call, which is called per browser launch
This is actually fine for the use case — Chrome flags are set at launch time and can't be changed mid-run — but freemem() is point-in-time and can fluctuate. If multiple workers launch in quick succession, each samples memory independently. On a heavily loaded system this is benign (each sees the actual pressure), but it's worth a code comment noting the sampling is intentionally per-launch, not cached at module init.
Nits
getSystemFreeMb() vs getFreeMb() naming inconsistency — two files, two functions doing the same thing with different names. Consider either exporting a shared utility, or at minimum naming them consistently. nit
PR description says "data URI cache is a fallback path" — clarify the threshold asymmetry — the code comment in config.ts describes frameDataUriCacheLimit as a "hard upper bound" and notes the byte budget normally fires first. Since frames are HTTP-served in the primary path, the cache scaling is conservative defense-in-depth. Fine, just worth stating explicitly in the description so reviewers don't think this is on the hot path. nit
CI
The Format check failure is a network flake — @oxlint/binding-linux-x64-gnu tarball download failed during bun install with no formatting check executed. Known recurring issue on hyperframes CI (unrelated to this PR). Re-run to clear.
Verdict: REQUEST CHANGES
Reasoning: The require("os") call in config.ts is broken in native ESM — getSystemFreeMb() always returns 8192, making the cache-size scaling dead code except when consumed through the CLI bundle's createRequire shim. Fix is a one-line change to a top-level import. The browserManager.ts side is solid and can merge once config.ts is corrected.
— Vai
jrusso1020
left a comment
There was a problem hiding this comment.
Vai's blocker verified at current PR head (02372697). Confirmed:
packages/engine/package.jsonhas"type": "module"→ engine is native ESMconfig.ts:213usesrequire("os")inside try/catch →ReferenceErrorswallowed →getSystemTotalMb()always returns 16384 MB fallback- Both
memoryAdaptiveCacheLimit()andmemoryAdaptiveCacheBytesMb()therefore always return high-memory defaults - Cache-scaling half of the fix is dead code in native ESM runtime (the CLI bundle works only because tsup injects a
createRequireshim)
I had this as a non-blocking "inconsistency between dynamic require + static import" observation; should have graded as blocker once I noticed the "type": "module" declaration. One-line fix: hoist import { totalmem } from "os" to the top of config.ts (matches browserManager.ts pattern). My would-be APPROVE shifts to defer to Vai's CHANGES_REQUESTED.
Vai's two importants — cgroup memory limits aren't reflected in os.totalmem() (containers won't get the fix), and no tests for the four scaling functions — both align with what I had as non-blocking but should also be addressed.
— Rames Jusso
Replace dynamic require("os") with static import — require is undefined
in native ESM, causing the try/catch to silently return the 16GB
fallback on every machine. The cache scaling was dead code.
jrusso1020
left a comment
There was a problem hiding this comment.
Verified at e2fbd0de — require("os") replaced with static import { totalmem } from "os", dead-code try/catch removed. Cache scaling now works in native ESM. Matches the browserManager.ts shape.
Re-approved.
— Rames Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
Updated review — blocker resolved
The ESM blocker from the last pass is fixed. import { totalmem } from "os" at the top of config.ts is correct static ESM syntax, and browserManager.ts correctly extends the existing homedir import to { homedir, totalmem }. The function was also renamed to getSystemTotalMb() which matches what it actually returns — good.
totalmem vs freemem — correct choice
Using totalmem() (not freemem()) is the right call here. Total RAM is a stable, boot-time constant; free RAM fluctuates second-to-second and would produce non-deterministic config at startup (different limits each run, harder to debug, impossible to reproduce). Total RAM correctly captures "what class of machine is this" which is what these thresholds need.
Remaining importants (carried from prior review, neither is a blocker for this fix)
totalmem() ignores cgroup memory limits — in Docker/k8s with a memory cap (e.g., --memory=4g on a 32GB host), totalmem() returns 32GB and the node gets the 4096MB GPU budget it can't actually back. freemem() has the same problem. Fixing this requires reading /sys/fs/cgroup/memory.limit_in_bytes (v1) or /sys/fs/cgroup/memory.max (v2) and falling back to totalmem(). Probably out of scope here — just worth noting if these are deployed in containers.
No unit tests for the four scaling functions — memoryAdaptiveCacheLimit, memoryAdaptiveCacheBytesMb, getGpuMemBudgetMb, getLowMemoryFlags are untested. The tier boundaries (4096 / 8192 MB) are the core of this fix and worth a few snapshot-style tests to prevent silent regression.
Minor nit
The PR description and the inline comment (// Memory — scale GPU budget to available system RAM) say "available RAM" but the implementation uses total RAM. Worth updating the description to say "total installed RAM" to avoid confusion when someone reads the commit log later.
Code is correct, the crash fix is sound. Approving.
— Vai
Scale GPU budget to half of total RAM (capped at 16GB) instead of hardcoding 4096MB. A 32GB machine now gets 16GB GPU budget; a 64GB machine gets 16GB (Chrome's practical limit). Low-memory tiers unchanged.
…heuristic On NVIDIA systems, spawns nvidia-smi once (cached) to read actual GPU memory. Uses real VRAM for the Chrome GPU budget instead of guessing from total system RAM. Falls back to total/2 on non-NVIDIA systems or when nvidia-smi is unavailable. No other headless Chrome renderer probes GPU memory — Remotion, Puppeteer, and Playwright all ignore --force-gpu-mem-available-mb entirely.
jrusso1020
left a comment
There was a problem hiding this comment.
Two new commits since last approval — one regression + scope expansion
🚩 ESM regression in probeNvidiaVramMb
New code at browserManager.ts:484-499 (commit e53894003):
function probeNvidiaVramMb(): number | null {
if (_cachedVramMb !== null) return _cachedVramMb;
try {
const { execSync } = require("child_process") as typeof import("child_process");
...
} catch { ... }
return null;
}This is the exact pattern Vai just flagged as a blocker on the previous iteration — require() is undefined in native ESM, the try/catch silently swallows the ReferenceError, the function always returns null. In the CLI bundle it works via tsup's createRequire shim; in native ESM consumers, nvidia-smi probing is dead code that always falls through to the RAM-based heuristic.
Functional impact is bounded (the fallback IS the documented behavior when no NVIDIA GPU is present), but the intent of the probe is broken in native ESM. Same one-line fix as before: hoist to a top-level import { execSync } from "node:child_process".
Scope expansion observation
Two changes landed in this iteration that go beyond the PR's stated scope:
-
fix(engine): remove 4096MB GPU budget ceiling for high-RAM systems—getGpuMemBudgetMbfor>=8GBwas4096; nowMath.min(Math.floor(total / 2), 16384). A 32GB system jumps from 4GB → 16GB GPU budget. Doubles or quadruples the budget on common high-RAM workstations. Not in the PR title or description; arguably belongs in a separate PR with its own justification. -
nvidia-smi probe — adds a 3-second shell-out to determine actual VRAM. On non-NVIDIA systems where
nvidia-smiexists but is slow to respond (rare), Chrome startup is delayed by up to 3s. The 3s timeout + try/catch make this bounded, but it's a new failure mode on a critical path.
Both changes are reasonable on their own merits, but they materially change what this PR ships. The PR title and root-cause section still focus on the OOM fix for low-RAM systems — they don't mention the high-RAM budget increase or VRAM probing. Worth either splitting them out or expanding the PR description to capture the full scope.
Verdict shift
My prior APPROVE was on e2fbd0de (the clean ESM fix). This new SHA introduces a regression of that same fix in a different file. Reverting to COMMENT pending:
- Static import for
child_process(one-line) - PR description update OR split-out of the high-RAM budget + nvidia-smi changes
CI is green (13 success / 12 in_progress / 0 failures) — neither change broke tests.
— Rames Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review at `e5389400` — new commits after last approval
Two commits landed after `e2fbd0de`:
- `2a673348` — removes 4096 MB ceiling, scales GPU budget to `totalmem() / 2` (capped at 16 GB) for high-RAM systems
- `e5389400` — adds `probeNvidiaVramMb()`: spawns `nvidia-smi` once (cached) to get actual VRAM; uses real VRAM when available, falls back to `total/2` heuristic otherwise
The VRAM-probing idea is the right instinct. But there's a structural bug in the new code that's a structural repeat of the blocker I raised last round.
Blocker
`probeNvidiaVramMb()` uses `require("child_process")` — dead code in native ESM
// browserManager.ts:497-498 — BROKEN in ESM consumers
const { execSync } = require("child_process") as typeof import("child_process");`packages/engine` is `"type": "module"` (same as before). The `require` call throws a `ReferenceError` in native ESM, which the `catch {}` swallows — so `probeNvidiaVramMb()` silently returns `null` on every call and the VRAM probe is dead code for any consumer that imports the engine directly.
This is not just theoretical. `packages/producer` has `"@hyperframes/engine": "workspace:^"` in its dependencies and imports from the engine's source directly. When the producer runs outside the CLI bundle, the probe never fires.
The same file already has static ESM imports at the top:
import { homedir, totalmem } from "os";The fix is one line — hoist the `child_process` import:
import { execSync } from "child_process";And remove the dynamic `require` inside the function body. The try/catch for nvidia-smi unavailability is correct and should stay — `execSync` will throw when nvidia-smi isn't found, and the catch returns `null` as intended.
Important
`execSync` with a 3 s timeout blocks the Node event loop on first browser launch
The result is cached after the first call, so it only fires once per process. But on a system where `nvidia-smi` is installed but the GPU is slow to respond (or the binary hangs), every first render hangs the event loop for up to 3 seconds. `spawnSync` with the same timeout is equivalent and has the same problem — there's no truly async-safe way to cache this without a bigger refactor.
Acceptable for now, but worth a code comment: `// Blocks event loop once per process; cached afterward.`
GPU budget cap inconsistency between the heuristic and the VRAM probe
- VRAM probe path: `Math.min(vram, 65536)` — cap at 64 GB
- Heuristic path (`total/2`): `Math.min(Math.floor(total / 2), 16384)` — cap at 16 GB
The commit message for `2a673348` says "16GB is Chrome's practical limit," which would make the 64 GB cap on the VRAM path wrong. Or the 16 GB heuristic cap is too conservative. Pick one ceiling and document why — right now these disagree silently.
Prior importants still open from last review
- No unit tests for `getGpuMemBudgetMb()`, `getLowMemoryFlags()`, `memoryAdaptiveCacheLimit()`, `memoryAdaptiveCacheBytesMb()`. The new `probeNvidiaVramMb()` adds more untested branches (nvidia path, fallback path, cached-result path). These are trivially mockable with `vi.mock("child_process", ...)` / `vi.mock("os", ...)`.
- `totalmem()` still reports host RAM on Docker/k8s with a cgroup memory limit — containers won't get the fix. A code comment scoping the limitation would suffice.
- PR description / inline comment still say "available system RAM" but use `totalmem()`.
What changed is otherwise sound
The `totalmem() / 2` heuristic for high-RAM systems (replacing the hardcoded 4096) is an improvement — a 32 GB render box no longer gets an artificially capped budget. The VRAM probe design (cache, timeout, silent fallback) is the right shape. These just need the static import fix before they work correctly in native ESM.
— Vai
- Replace require("child_process") with static import (same ESM fix
as config.ts — require is undefined in native ESM)
- Unify cap: both VRAM probe and heuristic paths now cap at 16GB
- Add comment noting the one-time blocking execSync is cached
jrusso1020
left a comment
There was a problem hiding this comment.
Verified at 632ecf60:
require("child_process")→ staticimport { execSync }at top of file. ✓- Unified 16GB cap on both VRAM-probe and heuristic paths (was 64GB vs 16GB asymmetric). ✓
- Comment added on execSync ("Synchronous, runs once per process (cached). ~50ms on typical systems.") ✓
Re-approved.
— Rames Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review at 632ecf6. All three blockers from the previous round are resolved.
Blockers resolved
import { execSync } from "child_process"is a static ESM top-level import at line 9 ofbrowserManager.ts— verified at HEAD. No CJSrequireanywhere in the diff.- VRAM cap unified:
probeNvidiaVramMbpath caps atMath.min(vram, 16384), heuristic fallback caps atMath.min(Math.floor(total / 2), 16384). Both 16 384 MB. Mismatch gone. execSyncblocking behavior documented with the inline comment:// Synchronous, runs once per process (cached). ~50ms on typical systems.
Logic looks correct
- Module-level
_cachedVramMbensures thenvidia-smisubprocess fires at most once per process lifetime. Safe. timeout: 3000+stdio: ["pipe","pipe","pipe"]keeps the probe from hanging or polluting stdout. Good defensive hygiene.parseInt(out.split("\n")[0] ?? "", 10)+Number.isFinite(mb) && mb > 0guard handles multi-GPU output and malformed responses without throwing.getLowMemoryFlagsonly activates below 8 GB — high-memory (production render) machines get zero additional flags. Clean.config.tsmemory-adaptive cache limits follow the sametotalmem()approach already used inparallelCoordinator.ts, which is the right precedent.
CI — Lint, Format, Test: runtime contract, player-perf all green. Build and CLI smoke still running at review time; no failures observed. Known ONNX ETIMEDOUT flake noted but not triggered.
LGTM. Ship it.
— Vai
Summary
Fixes Chrome OOM crashes on low-memory systems (e.g., 8GB total RAM) during video frame capture. The root cause:
--force-gpu-mem-available-mb=4096tells Chrome it has 4GB of GPU memory regardless of actual system RAM, causing over-allocation and OOM kills.Closes #1072
Root cause
The reporter's system has 8GB total RAM (1.8GB free at render time). Chrome's
--force-gpu-mem-available-mb=4096flag (hardcoded) makes the GPU texture allocator think it has 4GB to work with. When capturing frames of a 1080x1920 video, Chrome aggressively allocates decoded bitmaps, exceeds actual capacity, and the OS OOM-killer terminates the renderer — surfacing as"Protocol error (Runtime.callFunctionOn): Target closed".Uses
totalmem()(notfreemem()) consistent withcalculateOptimalWorkersin parallelCoordinator.ts — macOS reports misleadingly low freemem() due to aggressive file caching.Changes
packages/engine/src/services/browserManager.ts:--force-gpu-mem-available-mbscales with total system RAM (512MB/<4GB, 1024MB/<8GB, 4096MB/>=8GB)--js-flags=--max-old-space-size=Non low-RAM systems to cap Chrome's V8 heappackages/engine/src/config.ts:import { totalmem }— fixes ESM compatibility (previousrequire("os")was dead code)Test plan