Skip to content

chore(ios): update MLX backend binary with compiled-execution path#1235

Merged
NorbertKlockiewicz merged 1 commit into
mainfrom
@nk/mlx-ios-compiled-backend
Jun 16, 2026
Merged

chore(ios): update MLX backend binary with compiled-execution path#1235
NorbertKlockiewicz merged 1 commit into
mainfrom
@nk/mlx-ios-compiled-backend

Conversation

@NorbertKlockiewicz

Copy link
Copy Markdown
Contributor

Description

Updates the prebuilt iOS MLX backend binary (libbackend_mlx_ios.a and the ExecutorchLib.xcframework that embeds it) with the latest ExecuTorch MLX backend.

New backend capabilities baked into the binary:

  • RMSNorm fusion — fuses the Gemma-style fp32-upcast RMSNorm cluster into a single fused op.
  • mx::compile compiled-execution path — traces the interpreter once per input signature and replays the fused graph, cutting per-token Metal kernel encoding.
  • rope_t custom op — tensor-offset RoPE so the decode graph has no .item() (required for mx::compile).
  • Tensor-based index_copy — KV-cache update without an eval-during-trace sync.
  • Copy-mode constant loading + residency guard — materializes weights into MLX-owned memory and fails loudly if the copy is elided.

This binary is required to run the compile-friendly Gemma 4 E2B MLX .pte exports; the previous binary would not execute them correctly.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  1. Build and run the LLM example app on an iOS device.
  2. Load a compile-friendly Gemma 4 E2B MLX model and generate text.
  3. Confirm generation is coherent and runs at the improved decode rate (the runtime logs compiled execution: enabled on first inference).

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

This PR contains only the rebuilt binary artifacts. The corresponding backend source lives in the ExecuTorch fork; this updates the prebuilt that the iOS app links against. The simulator variant (libbackend_mlx_simulator.a) is unchanged in this PR.

This PR was authored with Claude.

Rebuilds libbackend_mlx_ios.a (and the ExecutorchLib xcframework that
embeds it) from the updated ExecuTorch MLX backend: RMSNorm fusion,
mx::compile compiled-execution path, rope_t custom op, tensor-based
index_copy, and copy-mode constant loading with a residency guard.
Required to run the compile-friendly Gemma 4 E2B MLX .pte exports.

Authored with Claude.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NorbertKlockiewicz NorbertKlockiewicz merged commit 28d8af6 into main Jun 16, 2026
4 checks passed
@NorbertKlockiewicz NorbertKlockiewicz deleted the @nk/mlx-ios-compiled-backend branch June 16, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants