Skip to content

release: v0.9.2#1259

Merged
chmjkb merged 5 commits into
release/0.9from
@chmjkb/patch-0.9
Jun 17, 2026
Merged

release: v0.9.2#1259
chmjkb merged 5 commits into
release/0.9from
@chmjkb/patch-0.9

Conversation

@chmjkb

@chmjkb chmjkb commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds a patch with RF-Detr Keypoint preview support

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

Register the RF-DETR keypoint preview pose model with xnnpack, coreml
and mlx backends (all fp32). This is a beta preview export and may be
re-exported under a different constant once a stable version ships.

- modelUrls/modelRegistry: add the three backend URLs and variant map
- PoseEstimationModule/types: register the model config
(single-`forward` export, no inputSize axis) and extend
PoseEstimationModelSources
- demo: load it via usePoseEstimation in the pose estimation screen
- docs: list it in the model registry and usePoseEstimation supported
models

## Description

<!-- Provide a concise and descriptive summary of the changes
implemented in this PR. -->

### Introduces a breaking change?

- [ ] Yes
- [ ] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [ ] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [ ] iOS
- [ ] Android

### Testing instructions

<!-- Provide step-by-step instructions on how to test your changes.
Include setup details if necessary. -->

### Screenshots

<!-- Add screenshots here, if applicable -->

### Related issues

<!-- Link related issues here using #issue-number -->

### Checklist

- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings

### Additional notes

<!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. -->

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@chmjkb chmjkb marked this pull request as ready for review June 17, 2026 14:25
@chmjkb chmjkb changed the title feat: add RF-DETR keypoint preview model (#1257) release: v0.9.0 Jun 17, 2026
@chmjkb chmjkb changed the title release: v0.9.0 release: v0.9.2 Jun 17, 2026
@barhanc barhanc self-requested a review June 17, 2026 14:36
@msluszniak

msluszniak commented Jun 17, 2026

Copy link
Copy Markdown
Member

We should definitely include fix for vision encoder in this patch. Please check if there are other applicable additions.

chmjkb and others added 3 commits June 17, 2026 16:43
## Description

In any multimodal conversation with more than one image, the model
starts describing earlier images as the most recently sent one on later
turns.

`VisionEncoder::encode` caches the `EValue` returned by
`vision_encoder.execute()` per image path. That tensor aliases the
method's reusable output buffer, so the next `execute()` (the second
image, or any later encode) overwrites the bytes behind every cached
entry. On re-prefilled turns the prefiller then splices the latest
image's embeddings into every image slot. The audio path already
snapshots its encoder output for exactly this reason (see the
`AudioSlot` comment in `multimodal_prefiller.cpp`); vision never got the
same treatment.

The fix copies the encoder output into bytes owned by the cache entry
immediately after `execute()` and serves cache hits from a tensor
wrapping those owned bytes (`unordered_map` nodes are pointer-stable, so
the blob stays valid).

The bug is backend-independent (the cache sits above the delegate), so
XNNPACK/Vulkan multimodal models are affected the same way.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [x] Bug fix (change which fixes an issue)
- [ ] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [ ] Android

### Testing instructions

1. Run the example LLM app with a multimodal model (e.g. Gemma 4 E2B
multimodal) on the Multimodal LLM screen.
2. Send image A with "What's in this picture?" — answer is correct.
3. Send image B (different content) with the same question — answer is
correct.
4. Ask "What was in the FIRST picture I sent?".

Before this fix, step 4 describes image B's content (both image slots
receive B's embeddings on the re-prefilled turn). After the fix, the
model correctly recalls image A.

### Screenshots

N/A

### Related issues

N/A

### Checklist

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [x] My changes generate no new warnings

### Additional notes

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Description

Optimizes token sampling for large-vocabulary models (e.g. Gemma 4 E2B,
262k vocab), where the previous full-vocabulary sort in top-p dominated
per-token latency.

Two changes in `sampler.cpp`:

- **`mask_topp`**: replaces the `O(n log n)` sort over all logits with a
logit-space histogram (`kBins=2048`) that locates the nucleus threshold
in two `O(n)` passes — no sort, no per-token vocab-sized allocation.
Binning in logit space (rather than probability space) keeps uniform
resolution for both peaked and flat distributions.
- **`softmax`**: skips `exp()` on logits already masked to `lowest()` by
top-k/top-p. The result underflows to zero anyway, and the call is slow
on device.

On an iPhone 17 Pro with Gemma 4 E2B (int4), per-token sampling drops
from ~45 ms to ~10 ms. The histogram approximates the exact sort-based
nucleus; the resulting sampled distribution is statistically equivalent
(verified the kept-mass fraction stays within <1% of the exact nucleus
across peaked, flat, and sharp distributions).

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [ ] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [x] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [ ] Android

### Testing instructions

1. Run an LLM with a large vocabulary and a non-zero temperature with
`topP` set (e.g. Gemma 4 E2B with `temperature: 0.8`, `topP: 0.9`).
2. Generate a long response and observe tokens/sec.
3. Confirm output remains coherent and sampling is unchanged in
character (still stochastic, not greedy).

Greedy decoding (`temperature: 0`) is unaffected — it bypasses this path
entirely.

### Screenshots

<!-- N/A -->

### Related issues

<!-- N/A -->

### Checklist

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [x] My changes generate no new warnings

### Additional notes

The histogram is an approximation bounded by bin granularity
(`kBins=2048` over a `kRange=40` logit span). This is intentional: exact
top-p over a 262k vocab where the nucleus can exceed 100k tokens is
inherently expensive, and the sampling outcome is statistically
indistinguishable from the exact version.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chmjkb chmjkb merged commit 77d176d into release/0.9 Jun 17, 2026
2 checks passed
@chmjkb chmjkb deleted the @chmjkb/patch-0.9 branch June 17, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants