Environment
- cactus-react-native: 1.13.1
- Device: iPhone 16e (A16 Bionic, 8 GB RAM)
- Model:
gemma-4-e2b-it, quantization: 'int4', pro: false
Two separate bugs
Bug 1 — Dual instance same slug: 'Model already exists', 'gemma-4-e2b-it-int4'
When useCactusLM({ model: 'gemma-4-e2b-it', options: { quantization: 'int4' } }) is mounted in a navigation component (to check isDownloaded) and a separate new CactusLM({ model: 'gemma-4-e2b-it', ... }) singleton exists in a non-React module (e.g. scan.ts), the native layer logs 'Model already exists', 'gemma-4-e2b-it-int4' when the second instance calls init().
The hook creates its own CactusLM instance on mount. If a class-based singleton for the same slug is already initialized elsewhere, the second init() silently fails or logs this warning. There is no deduplication, no shared registry, and no clear error thrown.
Expected: Either throw a clear error, return the existing instance, or document that only one instance per slug is supported at a time.
Bug 2 — No CPU RAM guard before vision inference → std::bad_alloc → OOM crash
With pro: false (no ANE bundle), vision inference for Gemma 4 E2B runs on CPU and allocates ~2–3 GB of intermediate activation buffers. On iPhone 16e (8 GB total, ~2–3 GB available to app), this triggers std::bad_alloc inside complete(). The crash cascades: system enters severe memory pressure, and a subsequent UIGraphicsBeginImageContext call for a 390×260 camera preview frame fails to allocate 3.5 MB → unrecoverable NSInternalInconsistencyException.
[WARN] [npu] [gemma4-vision] vision_encoder.mlpackage not found; using CPU vision encoder
[ERROR] [complete] Exception: std::bad_alloc
CGBitmapContextInfoCreate: unable to allocate 3678528 bytes for bitmap data
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException'
Expected: Before running CPU vision inference, check available RAM (or model's estimated activation memory) and either warn the caller or throw a typed error (InsufficientMemoryError) instead of a C++ std::bad_alloc that crashes the process.
Notes
pro: true would use ANE and avoid the OOM, but the new CQ4-apple bundle is 5.56 GB — impractical for a first-run download. CQ3-apple (3.82 GB) is more reasonable but still large.
- The
int4 GGUF bundle (~1.5 GB) that existed before the CQ format migration is no longer on HuggingFace. Apps that downloaded with the old format now need a full re-download.
Environment
gemma-4-e2b-it,quantization: 'int4',pro: falseTwo separate bugs
Bug 1 — Dual instance same slug:
'Model already exists', 'gemma-4-e2b-it-int4'When
useCactusLM({ model: 'gemma-4-e2b-it', options: { quantization: 'int4' } })is mounted in a navigation component (to checkisDownloaded) and a separatenew CactusLM({ model: 'gemma-4-e2b-it', ... })singleton exists in a non-React module (e.g.scan.ts), the native layer logs'Model already exists', 'gemma-4-e2b-it-int4'when the second instance callsinit().The hook creates its own
CactusLMinstance on mount. If a class-based singleton for the same slug is already initialized elsewhere, the secondinit()silently fails or logs this warning. There is no deduplication, no shared registry, and no clear error thrown.Expected: Either throw a clear error, return the existing instance, or document that only one instance per slug is supported at a time.
Bug 2 — No CPU RAM guard before vision inference →
std::bad_alloc→ OOM crashWith
pro: false(no ANE bundle), vision inference for Gemma 4 E2B runs on CPU and allocates ~2–3 GB of intermediate activation buffers. On iPhone 16e (8 GB total, ~2–3 GB available to app), this triggersstd::bad_allocinsidecomplete(). The crash cascades: system enters severe memory pressure, and a subsequentUIGraphicsBeginImageContextcall for a 390×260 camera preview frame fails to allocate 3.5 MB → unrecoverableNSInternalInconsistencyException.Expected: Before running CPU vision inference, check available RAM (or model's estimated activation memory) and either warn the caller or throw a typed error (
InsufficientMemoryError) instead of a C++std::bad_allocthat crashes the process.Notes
pro: truewould use ANE and avoid the OOM, but the new CQ4-apple bundle is 5.56 GB — impractical for a first-run download. CQ3-apple (3.82 GB) is more reasonable but still large.int4GGUF bundle (~1.5 GB) that existed before the CQ format migration is no longer on HuggingFace. Apps that downloaded with the old format now need a full re-download.