Skip to content

Implement Unicode composition/decomposition codepoint closure.#228

Merged
garretrieger merged 5 commits into
w3c:mainfrom
garretrieger:unicode_comp_decomp
May 29, 2026
Merged

Implement Unicode composition/decomposition codepoint closure.#228
garretrieger merged 5 commits into
w3c:mainfrom
garretrieger:unicode_comp_decomp

Conversation

@garretrieger
Copy link
Copy Markdown
Contributor

This adds unicode composition and decomposition to both the dependency graph and glyph closure during segmentation.

Text shapers may choose to apply unicode composition and decomposition to text prior to shaping. As a result the computed glyph dependencies used by the segmenter must take into account any unicode composition and decomposition substitutions that may happen. For context, the problem is explained in more detail here:

// WARNING: This is currently defaulted off since there are some known issues with this approach

UCD data files are used to construct a unicode to unicode dependency graph for both composition and decomposition. This is added to DependencyGraph which then incorporates it into condition analysis. Likewise the glyph closure computation is modified to added unicode closure via the dependency graph as a pre-step before handing off to harfbuzz.

This should unblock the use of patch merging which is currently disabled. That will be explored in a following PR.

@garretrieger garretrieger merged commit c34cf1a into w3c:main May 29, 2026
3 checks passed
@garretrieger garretrieger deleted the unicode_comp_decomp branch May 29, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant