fix(qwen3_vl): require truthy value in untyped image_part fallback by eligotts · Pull Request #46 · PrimeIntellect-ai/renderers

eligotts · 2026-05-17T05:52:41Z

Summary

_is_image_part / _is_video_part in renderers/qwen3_vl.py use a
permissive fallback (return "image_url" in item) that misfires after
a Dataset.from_list round-trip. Arrow schema unification adds an
image_url: None zombie key to every text part when the same content
list mixes text and image parts, so the fallback returns True for the
text part, the caller dispatches emit_image, and _load_pil_image
raises TypeError: Unsupported image source 'NoneType'.

Every rollout in an env that embeds a multimodal user prompt directly
in its dataset column (e.g. tic-tac-toe, where visual_prompt = [ {"role": "user", "content": [text, image_url]}] is part of every
dataset row) fails before reaching the model.

Repro (no renderers state needed)

from datasets import Dataset
row = {"prompt": [{"role": "user", "content": [
    {"type": "text", "text": "hi"},
    {"type": "image_url", "image_url": {"url": "data:image/png;base64,XXX"}},
]}]}
print(Dataset.from_list([row])[0]["prompt"][0]["content"][0])
# => {'image_url': None, 'text': 'hi', 'type': 'text'}
#               ^^^^^^^^^^^^^ Arrow unified the schema across the list

from renderers.qwen3_vl import _is_image_part
_is_image_part({"type": "text", "text": "hi", "image_url": None})
# => True   (misclassified — should be False)

Fix

Make the type field authoritative when present:

def _is_image_part(item):
    if not isinstance(item, dict): return False
    t = item.get("type")
    if t in ("image", "image_url"): return True
    if t is not None: return False                                    # ← new
    return bool(item.get("image")) or bool(item.get("image_url"))     # ← truthy, not key-presence

Same fix applied to _is_video_part symmetrically.

These helpers live in renderers/qwen3_vl.py but are imported by every
multimodal renderer in the package — renderers/qwen35.py (the
Qwen3.5 / Qwen3.6 family) and renderers/kimi_k25.py (Kimi K2.5) both
from renderers.qwen3_vl import _is_image_part, _is_video_part. So the
fix covers all three renderer families without touching their files.

Test plan

Adds test_is_image_part_treats_type_field_as_authoritative in
tests/test_multimodal.py. Exercises both the schema-unified text
case (the actual regression) and the untyped fallback's truthy-value
requirement. Synthesizes the schema-unified content shape directly so
the test doesn't depend on datasets as a test dep.
Test passes locally.
Verified end-to-end on Qwen/Qwen3.5-2B + the tic-tac-toe env via
prime-rl: with the fix, the renderer correctly classifies tic-tac-toe's
text+image initial prompt and the trainer runs through 8 RL steps with
mismatch_kl/max=0.00165.

Fixes #40.

`_is_image_part` / `_is_video_part` use a permissive fallback — `return "image_url" in item` — that misfires after `Dataset.from_list` round-trip. Arrow schema unification adds an `image_url: None` zombie key to every text part when the same content list mixes text and image parts, so the fallback returns True, the caller dispatches `emit_image`, and `_load_pil_image` raises `TypeError: 'NoneType'`. Every rollout in an env whose dataset embeds a multimodal user prompt in the initial prompt column (e.g. tic-tac-toe) fails before reaching the model. Make the `type` field authoritative when present (`type="text"` always classifies as not image / video), and require a truthy value — not mere key presence — in the untyped fallback. Same fix applied to `_is_video_part` symmetrically. These helpers live in `qwen3_vl.py` but are imported and used by every multimodal renderer in the package — `qwen35.py` (the Qwen3.5 / Qwen3.6 family) and `kimi_k25.py` (Kimi K2.5) both `from renderers.qwen3_vl import _is_image_part, _is_video_part`. So the fix covers all three renderer families without touching their files. Adds a focused regression test exercising both the schema-unified text case and the untyped fallback's truthy-value requirement. The test synthesizes the schema-unified shape directly rather than depending on `datasets` as a test dep. Fixes #40.

eligotts marked this pull request as ready for review May 17, 2026 05:55

eligotts requested a review from hallerite May 17, 2026 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwen3_vl): require truthy value in untyped image_part fallback#46

fix(qwen3_vl): require truthy value in untyped image_part fallback#46
eligotts wants to merge 1 commit into
mainfrom
fix/is-image-part-truthy

eligotts commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eligotts commented May 17, 2026

Summary

Repro (no renderers state needed)

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant