Dual CactusLM instances with same model slug causes 'Model already exists' + no CPU RAM guard before vision inference → OOM crash on iPhone

## Environment

- cactus-react-native: 1.13.1
- Device: iPhone 16e (A16 Bionic, 8 GB RAM)
- Model: `gemma-4-e2b-it`, `quantization: 'int4'`, `pro: false`

## Two separate bugs

### Bug 1 — Dual instance same slug: `'Model already exists', 'gemma-4-e2b-it-int4'`

When `useCactusLM({ model: 'gemma-4-e2b-it', options: { quantization: 'int4' } })` is mounted in a navigation component (to check `isDownloaded`) **and** a separate `new CactusLM({ model: 'gemma-4-e2b-it', ... })` singleton exists in a non-React module (e.g. `scan.ts`), the native layer logs `'Model already exists', 'gemma-4-e2b-it-int4'` when the second instance calls `init()`.

The hook creates its own `CactusLM` instance on mount. If a class-based singleton for the same slug is already initialized elsewhere, the second `init()` silently fails or logs this warning. There is no deduplication, no shared registry, and no clear error thrown.

**Expected:** Either throw a clear error, return the existing instance, or document that only one instance per slug is supported at a time.

### Bug 2 — No CPU RAM guard before vision inference → `std::bad_alloc` → OOM crash

With `pro: false` (no ANE bundle), vision inference for Gemma 4 E2B runs on CPU and allocates ~2–3 GB of intermediate activation buffers. On iPhone 16e (8 GB total, ~2–3 GB available to app), this triggers `std::bad_alloc` inside `complete()`. The crash cascades: system enters severe memory pressure, and a subsequent `UIGraphicsBeginImageContext` call for a 390×260 camera preview frame fails to allocate 3.5 MB → unrecoverable `NSInternalInconsistencyException`.

```
[WARN] [npu] [gemma4-vision] vision_encoder.mlpackage not found; using CPU vision encoder
[ERROR] [complete] Exception: std::bad_alloc
CGBitmapContextInfoCreate: unable to allocate 3678528 bytes for bitmap data
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException'
```

**Expected:** Before running CPU vision inference, check available RAM (or model's estimated activation memory) and either warn the caller or throw a typed error (`InsufficientMemoryError`) instead of a C++ `std::bad_alloc` that crashes the process.

## Notes

- `pro: true` would use ANE and avoid the OOM, but the new CQ4-apple bundle is 5.56 GB — impractical for a first-run download. CQ3-apple (3.82 GB) is more reasonable but still large.
- The `int4` GGUF bundle (~1.5 GB) that existed before the CQ format migration is no longer on HuggingFace. Apps that downloaded with the old format now need a full re-download.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dual CactusLM instances with same model slug causes 'Model already exists' + no CPU RAM guard before vision inference → OOM crash on iPhone #30

Environment

Two separate bugs

Bug 1 — Dual instance same slug: `'Model already exists', 'gemma-4-e2b-it-int4'`

Bug 2 — No CPU RAM guard before vision inference → `std::bad_alloc` → OOM crash

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dual CactusLM instances with same model slug causes 'Model already exists' + no CPU RAM guard before vision inference → OOM crash on iPhone #30

Description

Environment

Two separate bugs

Bug 1 — Dual instance same slug: 'Model already exists', 'gemma-4-e2b-it-int4'

Bug 2 — No CPU RAM guard before vision inference → std::bad_alloc → OOM crash

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug 1 — Dual instance same slug: `'Model already exists', 'gemma-4-e2b-it-int4'`

Bug 2 — No CPU RAM guard before vision inference → `std::bad_alloc` → OOM crash