Feat(model support): ideogram4 support by Pfannkuchensack · Pull Request #9303 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-06-25T05:15:39Z

Summary

Adds first-class Ideogram 4 (text-to-image) support to InvokeAI — a new open-weight 9.3B single-stream DiT with a Qwen3-VL-8B text encoder and flow-matching sampler.

The defining trait of this model is that it is trained on a structured JSON prompt that describes the scene as a list of regions, each with a bounding box ([y_min, x_min, y_max, x_max], normalized 0–1000, origin top-left) and a text description. Plain text works but is markedly lower quality.

The headline feature here is that this JSON is auto-assembled on the frontend from the existing Canvas Regional Guidance layers: the global prompt becomes the overall description, and each enabled region contributes one element (its drawn rect → bbox, its prompt → description). Users can also paste raw JSON to drive the model directly.

Why this is purely frontend string assembly: Ideogram 4 does not use spatial attention masks for regions (unlike FLUX/SDXL/Z-Image regional guidance) — the region boxes are encoded as text inside the single JSON string fed to Qwen3-VL. So the backend only ever sees one prompt string; no mask-conditioning code is touched.

How

Backend (invokeai/backend/ideogram4/, vendored from the Apache-2.0 reference, copyright headers retained):

DiT (modeling_ideogram4.py), FLUX2-style KL VAE (autoencoder.py + latent_norm.py), logit-normal flow-match scheduler + presets (scheduler.py, sampler_configs.py), nf4/fp8 quantized loading, and InvokeAI-side denoise.py / text_encoding.py / sampling_utils.py.
Dual-branch asymmetric CFG: positive runs the conditional transformer over [text]+[image] tokens, negative runs the unconditional transformer over image-only tokens with zeroed LLM features (v = gw·pos + (1−gw)·neg). ⇒ no negative prompt. A transformer_pair.py wrapper keeps both transformers co-resident through the loop so the cache doesn't swap them every step (nf4 ≈ 10 GB resident during denoise; fits 24 GB).
Model-manager registration: new BaseModelType.Ideogram4, a Qwen3-VL text-encoder type, config detector + diffusers config, and a loader mirroring Z-Image.
Four invocations: ideogram4_model_loader, ideogram4_text_encoder, ideogram4_denoise, ideogram4_latents_to_image; new Ideogram4ConditioningInfo + field/output; ideogram4_txt2img generation mode.

Frontend:

buildIdeogram4Prompt.ts — Regions→JSON assembly (raw-JSON passthrough; stable key order; bbox clamped/rounded to 0–1000) with unit tests.
buildIdeogram4Graph.ts — text2img-only graph builder + enqueue wiring. Uses a decoy string node for the assembled JSON so the linear-UI batch injector doesn't clobber it, while plain-text prompts still flow through the real prompt node (so dynamic prompts / batching keep working).
Params + UI: a Sampler Preset combobox (Quality 48 / Default 20 / Turbo 12) as the primary control, plus Advanced overrides that actually apply to this model — Steps, Guidance Scale, Schedule Shift (mu) and a Color Palette picker. The irrelevant Advanced controls (VAE, CLIP Skip, CFG Rescale, Seamless, Color Compensation) are hidden for Ideogram 4.
Metadata recall for all of the above.

Dependencies: bumps transformers to >=5.5,<5.6 (Qwen3-VL landed in 4.57; the encoder needs it) and compel to >=2.4.0,<3, with the necessary adaptations to the FLUX / Z-Image loaders, the safety checker, the HF metadata fetcher and model_util.

Out of scope (v1): img2img / inpaint / outpaint, ControlNet / IP-Adapter / LoRA, and the optional local "Magic Prompt" plain-text→JSON expander (parked — see Merge Plan).

Related Issues / Discussions

Overlaps the transformers 5.x bump tracked in PR feat - Migrate to Transformers 5.5.4 #9248 — this branch carries the same bump (>=5.5,<5.6) plus the cross-model adaptations it requires.

QA Instructions

Requires the gated weights (ideogram-ai/ideogram-4-nf4 — nf4 is the 24 GB path, CUDA/bitsandbytes only) plus the Qwen3-VL encoder + VAE sub-dependencies.

Install & select an Ideogram 4 model; open the Canvas/Generate tab. Confirm the model shows under its own group, dimensions default to 1024×1024 (multiples of 16), and the Generation settings show the Sampler Preset control instead of Scheduler/CFG.
Regions → JSON: type an overall description, add 1–2 Regional Guidance layers each with a prompt + a drawn box, and Invoke. In the result's metadata, confirm the assembled JSON has the correct key order and elements[*].bbox (0–1000, [y_min, x_min, y_max, x_max]) matching where you drew the boxes, and that element placement in the image roughly matches.
Raw-JSON passthrough: paste a hand-written JSON object into the prompt box → it is sent unchanged (no region wrapping).
Plain text: with no regions and a plain prompt, confirm normal text2img still works and that dynamic prompts / batching still expand (decoy-node behavior).
Sampler presets: Quality / Default / Turbo produce the expected step counts.
Advanced overrides: Steps / Guidance Scale / mu show the active preset's value as the "auto" default and can be overridden + reset; Color Palette swatches inject style_description.color_palette (auto-build mode only — ignored for raw JSON).
Metadata recall: load an Ideogram 4 image and recall — preset, overrides and palette restore (and are guarded to only recall onto an Ideogram 4 model).

Frontend gates (from invokeai/frontend/web/): pnpm lint and pnpm test:no-watch (includes buildIdeogram4Prompt.test.ts). tsc clean.

Merge Plan

Large PR + dependency bump. This raises transformers to >=5.5,<5.6 and compel to >=2.4.0,<3, which touches many models (FLUX, Z-Image, safety checker, HF metadata fetch). Coordinate with / sequence after PR feat - Migrate to Transformers 5.5.4 #9248 (the transformers 5.x bump) to avoid a double-bump conflict, and time it to not collide with a pending release. Broad regression QA across existing model types is warranted, not just Ideogram 4.
Follow-ups (not in this PR): GGUF loader for the custom DiT (more VRAM headroom), starter_models.py entries, and the optional local Magic Prompt node (blocked on the upstream system-prompts PR feat: add System Prompts library for Expand Prompt button #9152, whose migration_32 collides with main and must be renumbered to 33 first).

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable) — frontend unit tests for the Regions→JSON assembly
❗Changes to a redux slice have a corresponding migration — new params fields use zod defaults; confirm if a migration is needed
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Switches compel from PyPI 2.1.1 to invoke-ai/compel@main fork which supports transformers 5.x. Bumps transformers floor to 5.9.0. Removes the transformers>=5.1.0 uv override that was only needed to bypass compel 2.1.1's <5.0 constraint. NOTE: compel fork pulls notebook dep (full Jupyter stack); flag to maintainer for cleanup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…s 5.x transformers 5.x no longer exposes rope_theta as a top-level attribute on Qwen3Config; the value is stored in the rope_parameters (and rope_scaling) dict instead. Read it from there with a getattr fallback so the inv_freq buffer is computed from the configured base (1e6 / 256) instead of raising AttributeError. Applies to both the safetensors and GGUF Qwen3 encoder paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…whoami huggingface_hub 1.x removed get_token_permission(). HFTokenHelper.get_status() now validates the token via whoami(), which returns user info for a valid token and raises HfHubHTTPError for an invalid one. Preserves the original three-way status: VALID on success, INVALID on HfHubHTTPError (e.g. 401), UNKNOWN on any other error (e.g. network failure). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

….9-compel-fork # Conflicts: # invokeai/app/api/routers/model_manager.py # invokeai/app/invocations/sd3_text_encoder.py # invokeai/backend/model_manager/metadata/fetch/huggingface.py # pyproject.toml # uv.lock

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The upstream merge left an unresolved conflict marker in _t5_encode and reintroduced T5TokenizerFast. Keep our v5 assertion (T5Tokenizer only) plus upstream's new t5_device logic, and drop the now-dead T5TokenizerFast monkeypatch in the test (the name no longer exists in the module). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- flux_text_encoder.py: drop unused typing.Union (F401) left by v5 import merge - huggingface.py: ruff format (wrap append(SimpleNamespace(...))) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

transformers 5.6 flattened CLIPTextModel (removed the self.text_model wrapper, hoisted embeddings/encoder/final_layer_norm to the top level). diffusers' single-file checkpoint loader (create_diffusers_clip_model_from_ldm) still assumes the nested layout, so loading SD1.5 .safetensors checkpoints fails on 5.6+ with 'CLIPTextModel object has no attribute text_model' and, once that read is shimmed, 'Cannot copy out of meta tensor' (weights never populate the flattened model). Pin to >=5.5,<5.6 (last pre-flattening release) which keeps both the single-file and from_pretrained paths working. The invoke-ai/compel fork accepts any 5.x. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore(deps): replace compel fork with official compel 2.4.0 compel 2.4.0 (released 2026-05-30) merges the transformers-5 support that the invoke-ai fork carried (both descend from upstream PR invoke-ai#129), plus the maintainer-reviewed padding rework and added diffusers/T5 smoke coverage. Switch from the git fork to the PyPI release. - pyproject: compel git+main -> compel>=2.4.0,<3 - uv.lock: compel 2.3.1 (git 8f404b45) -> 2.4.0 (pypi) - transformers stays 5.5.4 (satisfies compel >=5,<6 and our <5.6 pin) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> @

Vendor the Apache-2.0 Ideogram 4 reference model (DiT, FLUX2-style VAE, logit-normal flow-match scheduler, nf4/fp8 quant loading) into invokeai/backend/ideogram4/, plus InvokeAI glue (Qwen3-VL text encoding, packed-input build, dual-branch Euler denoise loop). Register the model: BaseModelType.Ideogram4, Main_Diffusers_Ideogram4_Config (detected via the Ideogram4Pipeline class name in model_index.json), and the Ideogram4DiffusersModel loader that loads both transformers as one Ideogram4TransformerPair submodel plus the Qwen3-VL encoder and VAE. Text-to-image only.

… loading End-to-end text-to-image backend for Ideogram 4, validated through the real session runner. Vendors the Apache-2.0 reference model (DiT, FLUX2-style VAE, logit-normal flow-match scheduler) into invokeai/backend/ideogram4/ with InvokeAI glue. Registers BaseModelType.Ideogram4, Main_Diffusers_Ideogram4_Config, and the Ideogram4DiffusersModel loader (two transformers as one Ideogram4TransformerPair; Qwen3-VL encoder + VAE). Both transformers and the encoder load via InvokeLinearNF4 so they work with the partial-load cache. Adds Ideogram4ConditioningInfo/Field/Output and the model_loader/text_encoder/denoise/l2i invocations. Text-to-image only.

Wires Ideogram 4 into the canvas/generate UI. buildIdeogram4Prompt assembles the structured JSON caption from the global prompt + Canvas Regional Guidance layers (each region → an obj element with a 0–1000 bbox + desc), with raw-JSON passthrough and a plain-text fallback when there are no regions. Adds buildIdeogram4Graph (text-to-image only, no negative prompt) and the enqueue switch. Structured captions use a static string node + a decoy positive-prompt node so the linear batch can't clobber the assembled JSON; plain text uses the real node so dynamic prompts/batching still work. Registers the 'ideogram-4' base (enums, color, names, model picker, grid size 16), a sampler-preset param (V4_QUALITY_48/V4_DEFAULT_20/V4_TURBO_12) replacing the steps/CFG controls, ParamIdeogram4SamplerPreset, and metadata recall. Regenerates schema.ts.

Advanced accordion now shows only Ideogram 4-relevant controls. Adds optional overrides of the sampler preset — steps, guidance scale (overrides the main gw, preserves the preset's polish tail), and schedule shift (mu) — plus a color palette editor that injects style_description.color_palette into the auto-built JSON caption (uppercase #RRGGBB, max 16, ignored for raw-JSON prompts). All are nullable (null = use preset), recallable from metadata, and the irrelevant controls (VAE, CLIP skip, CFG rescale, seamless, color compensation) are hidden for Ideogram 4. Backend denoise gains steps/guidance_scale/mu fields; schema.ts regenerated.

Your Name and others added 20 commits February 6, 2026 19:58

Update to Transformers 5.1.0

df20012

remove extra stuff

f69f22c

Merge branch 'main' into main

57e91b4

Merge remote-tracking branch 'upstream/main' into feat/transformers-5…

ffccf94

….9-compel-fork # Conflicts: # invokeai/app/api/routers/model_manager.py # invokeai/app/invocations/sd3_text_encoder.py # invokeai/backend/model_manager/metadata/fetch/huggingface.py # pyproject.toml # uv.lock

chore(deps): regenerate uv.lock after upstream merge

9fa5792

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

style: ruff fixes on merge-resolved files

6c7aedb

- flux_text_encoder.py: drop unused typing.Union (F401) left by v5 import merge - huggingface.py: ruff format (wrap append(SimpleNamespace(...))) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge branch 'main' into feat/transformers-5.9-compel-fork

b4773b9

Merge branch 'main' into feat/transformers-5.9-compel-fork

83fc562

Use existing keys + fix select size

443e0da

Update Readme

c2ca937

Pfannkuchensack requested review from JPPhoto, blessedcoolant, dunkeroni and lstein as code owners June 25, 2026 05:15

github-actions Bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels Jun 25, 2026

github-actions Bot added python-tests PRs that change python tests python-deps PRs that change python dependencies labels Jun 25, 2026

Pfannkuchensack mentioned this pull request Jun 25, 2026

[enhancement]: Ideogram Model support #9265

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat(model support): ideogram4 support#9303

Feat(model support): ideogram4 support#9303
Pfannkuchensack wants to merge 20 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/ideogram4-support

Pfannkuchensack commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Pfannkuchensack commented Jun 25, 2026

Summary

How

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants