Skip to content

Dual CactusLM instances with same model slug causes 'Model already exists' + no CPU RAM guard before vision inference → OOM crash on iPhone #30

@anouar-bm

Description

@anouar-bm

Environment

  • cactus-react-native: 1.13.1
  • Device: iPhone 16e (A16 Bionic, 8 GB RAM)
  • Model: gemma-4-e2b-it, quantization: 'int4', pro: false

Two separate bugs

Bug 1 — Dual instance same slug: 'Model already exists', 'gemma-4-e2b-it-int4'

When useCactusLM({ model: 'gemma-4-e2b-it', options: { quantization: 'int4' } }) is mounted in a navigation component (to check isDownloaded) and a separate new CactusLM({ model: 'gemma-4-e2b-it', ... }) singleton exists in a non-React module (e.g. scan.ts), the native layer logs 'Model already exists', 'gemma-4-e2b-it-int4' when the second instance calls init().

The hook creates its own CactusLM instance on mount. If a class-based singleton for the same slug is already initialized elsewhere, the second init() silently fails or logs this warning. There is no deduplication, no shared registry, and no clear error thrown.

Expected: Either throw a clear error, return the existing instance, or document that only one instance per slug is supported at a time.

Bug 2 — No CPU RAM guard before vision inference → std::bad_alloc → OOM crash

With pro: false (no ANE bundle), vision inference for Gemma 4 E2B runs on CPU and allocates ~2–3 GB of intermediate activation buffers. On iPhone 16e (8 GB total, ~2–3 GB available to app), this triggers std::bad_alloc inside complete(). The crash cascades: system enters severe memory pressure, and a subsequent UIGraphicsBeginImageContext call for a 390×260 camera preview frame fails to allocate 3.5 MB → unrecoverable NSInternalInconsistencyException.

[WARN] [npu] [gemma4-vision] vision_encoder.mlpackage not found; using CPU vision encoder
[ERROR] [complete] Exception: std::bad_alloc
CGBitmapContextInfoCreate: unable to allocate 3678528 bytes for bitmap data
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException'

Expected: Before running CPU vision inference, check available RAM (or model's estimated activation memory) and either warn the caller or throw a typed error (InsufficientMemoryError) instead of a C++ std::bad_alloc that crashes the process.

Notes

  • pro: true would use ANE and avoid the OOM, but the new CQ4-apple bundle is 5.56 GB — impractical for a first-run download. CQ3-apple (3.82 GB) is more reasonable but still large.
  • The int4 GGUF bundle (~1.5 GB) that existed before the CQ format migration is no longer on HuggingFace. Apps that downloaded with the old format now need a full re-download.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions