Skip to content

Add UTF-16 Strings#34

Open
JoltedJon wants to merge 3 commits into
Redot-Engine:masterfrom
JoltedJon:strings
Open

Add UTF-16 Strings#34
JoltedJon wants to merge 3 commits into
Redot-Engine:masterfrom
JoltedJon:strings

Conversation

@JoltedJon

@JoltedJon JoltedJon commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Add UTF-16 Strings

Summary by CodeRabbit

  • New Features
    • Added a new core containers module featuring an immutable UTF-16 string type with iterator support.
    • Included string utilities (comparison, substring, prefix/suffix checks, and UTF-16 searches).
    • Added a UTF-16 StringBuilder with capacity management and APIs to write UTF-16 text, runes, and formatted integers.
  • Enhancements
    • Expanded core text aliases (utf8, utf16, utf32) and aligned rune for consistent Unicode handling.
    • Added min/max helpers for arithmetic types.

Add UTF-16 Strings
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds Unicode type aliases (utf8, utf16, utf32) and changes rune to char32_t in definitions; introduces a draco::integral concept; adds min/max to math. Introduces a new core.containers.string C++20 module with immutable String, StringIterator (UTF-16→UTF-32 decoding), and StringBuilder (allocator-backed writes, surrogate-pair encoding, WriteInt), wired into CMake as the containers library.

Changes

UTF-16 String Module and Prerequisites

Layer / File(s) Summary
Unicode type aliases, integral concept, and min/max utilities
engine/native/core/definitions/stdtypes.cppm, engine/native/core/definitions/definitions.cppm, engine/native/core/math/functions.cppm
Adds utf8/utf16/utf32 aliases and changes rune to char32_t; introduces draco::integral concept; adds constrained min/max templates to draco::math and removes unused <numbers> include.
String, StringIterator, and StringBuilder struct declarations
engine/native/core/containers/string.cppm
Declares the exported core.containers.string module with the immutable String struct (data pointer, size, comparison, substring, search), StringIterator (UTF-16→UTF-32 state and decode), StringBuilder (capacity management, Write/WriteInt API), and the EMPTY_STRING constant.
String comparison, relational operators, and query implementations
engine/native/core/containers/string.cppm
Implements StringLength (alignment-aware 64-bit chunk scan), StringCompare (consteval/runtime memcmp paths), six relational operators, substring/affix/search helpers (Substr, StartsWith, EndsWith, FindFirst, FindLast), StringIterator::Advance (BMP + surrogate-pair decoding with malformed-pair rejection), and StringBuilder::WriteInt (base 2/10/16 digit build with reversal and sign handling).
StringBuilder capacity management and write operations
engine/native/core/containers/string.cpp
Implements Reserve (allocate/resize UTF-16 buffer with memcpy), GrowCapacity (growth factor computation), and Write overloads (single UTF-16 unit, rune with surrogate encoding, null-terminated strings, String objects), all returning memory::Error.
CMake build registration and module umbrella
engine/native/core/CMakeLists.txt, engine/native/core/containers/containers.cppm
Registers containers library via add_modules_library linked against definitions, memory, and math; adds the core.containers interface unit that re-exports core.containers.string.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Redot-Engine/DraconicEngine#23: Both PRs modify engine/native/core/definitions/stdtypes.cppm and definitions.cppm to update the shared public type system (rune alias and type traits).
  • Redot-Engine/DraconicEngine#18: Both PRs extend engine/native/core/math/functions.cppm to add numeric utility templates (min/max) to draco::math.

Suggested reviewers

  • mcdubhghlas
  • VictorSohier
  • Arctis-Fireblight

Poem

🐇 Hopping through UTF-16 land,
Where surrogates pair, just as planned,
StringBuilder grows with each byte,
StringCompare keeps the order right,
A rune is now char32_t indeed —
The rabbit types fast, at full speed! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely captures the main objective of the pull request: adding UTF-16 string support across multiple new files and modules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 503-507: The WriteInt function has two issues: (1) negating
minimum signed integer values like INT_MIN causes undefined behavior, and (2)
GrowCapacity return values are not checked for failures, potentially leading to
out-of-bounds writes. In the signed value negation block (checking if val < 0),
use a safer approach such as converting to an unsigned type, using absolute
value safely, or checking for the minimum value before negating. Additionally,
wherever GrowCapacity is called within WriteInt, verify its return value
indicates success before continuing with the write operation; if allocation
fails, return an appropriate error code or handle the failure gracefully to
prevent buffer overruns.
- Around line 351-368: The `StringBuilder::Create` method has a stack-allocated
`fixed_allocator` variable that is used to initialize the `allocator` member,
but the allocator stores a pointer to this local variable which becomes invalid
after the function returns. To fix this, the `fixed_allocator` must be stored as
a member of the `StringBuilder` struct itself to ensure it persists for the
lifetime of the object, allowing subsequent growth and free operations to safely
access it without dereferencing dangling memory.
- Around line 437-446: The StringBuilder::Write methods call GrowCapacity with
only the incoming string length, not accounting for the existing buffer size.
Change the GrowCapacity calls from GrowCapacity(length) to GrowCapacity(size +
length) in both Write overloads (the utf16 version and the other overload
mentioned as also applying to lines 449-457) to ensure the total required
capacity is correctly calculated when appending to a non-empty builder.
- Around line 228-233: The Substr method incorrectly rejects valid substring
requests by returning EMPTY_STRING whenever the requested range touches or
exceeds the string boundary. Instead of returning EMPTY_STRING when pos + len >=
size, check only if pos >= size (completely out of bounds). When pos is valid
but the requested length extends beyond the string end, truncate len to size -
pos so the method returns the available substring. This truncation contract fix
will also resolve the false negatives in StartsWith and EndsWith methods that
depend on Substr.
- Around line 320-324: The `StringBuilder::Create` method has a unit mismatch
where the `capacity` parameter represents UTF-16 elements but is passed directly
to `allocator.alloc` as a byte size without conversion. Multiply `capacity` by
the size of a single UTF-16 element (sizeof(utf16)) when calling
`allocator.alloc` to correctly convert from element count to byte count. This
ensures the allocated storage is large enough to hold the requested number of
UTF-16 characters without under-allocating and risking buffer overflow.
- Around line 371-385: The Reserve method in StringBuilder has two issues: the
memcpy call copies size elements unconditionally even when newCapacity is
smaller, risking buffer overflow, and the oldDst.size field is set as an element
count rather than bytes, causing the allocator.free call to receive incorrect
size units. Fix the memcpy to copy only the minimum of size and newCapacity
elements (to avoid overflow), and set oldDst.size to capacity * sizeof(utf16)
instead of just capacity to ensure the allocator.free call receives the size in
bytes as expected.
- Around line 182-205: The StringCompare method has two critical bugs: the
min_size variable is initialized using itself as arguments to draco::math::min
instead of using the actual size members of both strings, which causes undefined
behavior; additionally, the null-pointer checks don't handle the case where both
string texts are null, causing two empty strings to incorrectly compare as
unequal. Fix the min_size initialization by comparing the size member of the
current string with the size member of the parameter string s, and add a check
at the beginning of the method to return 0 (equal) when both text pointers are
null, ensuring that EMPTY_STRING == EMPTY_STRING evaluates correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9a636d9-6f20-4bfc-a1a3-f770f0cfaa40

📥 Commits

Reviewing files that changed from the base of the PR and between 28e62ca and 1d6442e.

📒 Files selected for processing (6)
  • engine/native/core/CMakeLists.txt
  • engine/native/core/containers/containers.cppm
  • engine/native/core/containers/string.cppm
  • engine/native/core/definitions/definitions.cppm
  • engine/native/core/definitions/stdtypes.cppm
  • engine/native/core/math/functions.cppm

Comment thread engine/native/core/containers/string.cppm
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm
@VictorSohier

Copy link
Copy Markdown
Contributor

The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files.

@JoltedJon

Copy link
Copy Markdown
Contributor Author

The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files.

The constexpr items and templates need to be in the module, I'll see what @OldDev78 says about the .cpp implementation to see if it would make sense to put the very few StringBuilder functions into a .cpp file

@OldDev78

OldDev78 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files.

The constexpr items and templates need to be in the module, I'll see what @OldDev78 says about the .cpp implementation to see if it would make sense to put the very few StringBuilder functions into a .cpp file

If a function serves best as template and/or can be effectively constrexpr (can be constant evaluated), then it's okay to be implemented in the .cppm. Otherwise, the implementation should just be separated into the .cpp. Take note that my biggest concern is to expose dependency on headers as little as possible (so, when I can, also move the includes to the .cpp).

@OldDev78

Copy link
Copy Markdown
Contributor

The integral concept is a merge conflict for me :/

return result;
}

bool constexpr String::operator==(String s) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially when the function is this simple, I´d rather it be inlined in the declaration above.

Comment on lines +86 to +97
/// Initialize with preallocated buffer with initial capacity
///
/// @param allocator Initalized allocator that allows for string builder to grow
/// @param buffer Preallocated buffer that was created with allocator
/// @param capacity The initial capacity that buffer has before it needs to be reallocated
memory::Error Create(Allocator allocator, utf16 *buffer, isize capacity);

/// Initialize with preallocated constant sized buffer, size in number of elements
///
/// @param buffer Preallocated buffer with no access to allocations
/// @param bufferMaxCapacity Max Capacity of preallocated buffer
memory::Error Create(utf16 *buffer, isize bufferMaxCapacity);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constructor functions aren't needed. A fixed allocator already provides this functionality by wrapping a fixed block of memory and can be abstracted to an Allocator type. It just returns the whole block of memory and aligns it for you. The abstraction is there to avoid duplicates like this.

Comment on lines +319 to +323
// Initializes with initial capacity
memory::Error StringBuilder::Create(Allocator allocator, isize capacity) {
// TODO(Jon) Assert capacity is greater than 0
Slice dst;
memory::Error err = allocator.alloc(&dst, capacity, alignof(utf16));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this initial allocation be defered to the first write? It would make this function error free.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can change the design to work with that. I was initially concerned over capacity if it starts out at 0 but that should be fine as all of the write functions ensure it adds a minimum amount, which I didn't know at the time as I wrote this function first


memory::Error StringBuilder::Write(rune r) {
if (r > 0x10FFFF || (r >= 0xD800 && r <= 0xDFFF)) {
return Error::Other;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another error type unioned with the allocator error type would be a better approach. It would allow for more granularity in this interface.

Comment on lines +480 to +497
static constexpr utf16 hexadecimalNibbles[] = {
u'0',
u'1',
u'2',
u'3',
u'4',
u'5',
u'6',
u'7',
u'8',
u'9',
u'A',
u'B',
u'C',
u'D',
u'E',
u'F',
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a base64 table and reused for all bases less than 64 (I wouldn't worry about padding equals signs here). Unfortunately, the ascii table isn't laid out so conveniently. It'd also give the last case the utility of all bases supported by the table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is probably the least important note here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this can be a later addition if we need it

Comment thread engine/native/core/containers/string.cppm Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
engine/native/core/containers/string.cppm (2)

349-354: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Infinite loop: base-2 loop condition checks val instead of unsignedVal.

The loop condition while (val != 0) checks the original (unchanged) parameter, but the loop body only modifies unsignedVal. Since val is never updated, this creates an infinite loop for any non-zero input when using base 2.

The base-16 (line 356) and base-10 (line 361) cases correctly use unsignedVal != 0.

Suggested fix
 	} else if (base == 2) {
-		while (val != 0) {
+		while (unsignedVal != 0) {
 			std::memcpy(buff + index, binaryNibbles[unsignedVal & 0xF], sizeof(binaryNibbles[0]));
 			index += 4;
 			unsignedVal >>= 4;
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/native/core/containers/string.cppm` around lines 349 - 354, The base-2
conversion loop condition checks the parameter `val` instead of `unsignedVal`,
causing an infinite loop since `val` is never modified while only `unsignedVal`
is updated in the loop body. Change the `while (val != 0)` condition in the
`else if (base == 2)` block to `while (unsignedVal != 0)` to match the correct
behavior used in the base-16 and base-10 cases.

194-197: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential integer overflow when sizes differ significantly.

The cast (int)(size - s.size) can overflow when the size difference exceeds INT_MAX (e.g., comparing a 3-billion-element string against an empty one). The result would be incorrect comparison semantics.

Suggested fix
 	if (result == 0) {
-		return (int)(size - s.size);
+		if (size == s.size) return 0;
+		return size < s.size ? -1 : 1;
 	}
-	return result;
+	return result < 0 ? -1 : 1;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/native/core/containers/string.cppm` around lines 194 - 197, The cast
(int)(size - s.size) in the return statement can overflow when the size
difference exceeds INT_MAX, causing incorrect comparison results for strings
with significantly different lengths. Fix this by safely comparing the sizes
without casting a potentially large size_t difference to int. Instead, check if
size is greater than s.size and return 1 if true, otherwise return -1 when sizes
differ, to maintain proper comparison semantics without overflow risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 32-49: The comparison operators (operator==, operator!=,
operator>, operator>=, operator<, operator<=) do not modify the String object
and should be marked as const member functions. Add the const keyword after the
parameter list for each of these six operators to indicate they don't modify the
object state and to allow them to be called on const String instances.
Additionally, update the StringCompare method declaration (referenced on line
31) to also be const since it is called by these operators and should not
require a mutable String instance.

---

Outside diff comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 349-354: The base-2 conversion loop condition checks the parameter
`val` instead of `unsignedVal`, causing an infinite loop since `val` is never
modified while only `unsignedVal` is updated in the loop body. Change the `while
(val != 0)` condition in the `else if (base == 2)` block to `while (unsignedVal
!= 0)` to match the correct behavior used in the base-16 and base-10 cases.
- Around line 194-197: The cast (int)(size - s.size) in the return statement can
overflow when the size difference exceeds INT_MAX, causing incorrect comparison
results for strings with significantly different lengths. Fix this by safely
comparing the sizes without casting a potentially large size_t difference to
int. Instead, check if size is greater than s.size and return 1 if true,
otherwise return -1 when sizes differ, to maintain proper comparison semantics
without overflow risk.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99d10637-fa46-476e-b1ad-5f8fdfb9ad32

📥 Commits

Reviewing files that changed from the base of the PR and between 7026321 and 7a0c807.

📒 Files selected for processing (2)
  • engine/native/core/containers/string.cpp
  • engine/native/core/containers/string.cppm

Comment on lines +32 to +49
bool constexpr operator==(String s) {
return StringCompare(s) == 0;
}
bool constexpr operator!=(String s) {
return StringCompare(s) != 0;
}
bool constexpr operator>(String s) {
return StringCompare(s) > 0;
}
bool constexpr operator>=(String s) {
return StringCompare(s) >= 0;
}
bool constexpr operator<(String s) {
return StringCompare(s) < 0;
}
bool constexpr operator<=(String s) {
return StringCompare(s) <= 0;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Comparison operators should be const member functions.

These operators don't modify the String object, so they should be marked const to allow calling them on const String instances and to communicate immutability intent.

Suggested fix
-	bool constexpr operator==(String s) {
+	bool constexpr operator==(String s) const {
 		return StringCompare(s) == 0;
 	}
-	bool constexpr operator!=(String s) {
+	bool constexpr operator!=(String s) const {
 		return StringCompare(s) != 0;
 	}
-	bool constexpr operator>(String s) {
+	bool constexpr operator>(String s) const {
 		return StringCompare(s) > 0;
 	}
-	bool constexpr operator>=(String s) {
+	bool constexpr operator>=(String s) const {
 		return StringCompare(s) >= 0;
 	}
-	bool constexpr operator<(String s) {
+	bool constexpr operator<(String s) const {
 		return StringCompare(s) < 0;
 	}
-	bool constexpr operator<=(String s) {
+	bool constexpr operator<=(String s) const {
 		return StringCompare(s) <= 0;
 	}

Also update StringCompare declaration on line 31:

-	int constexpr StringCompare(String s);
+	int constexpr StringCompare(String s) const;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/native/core/containers/string.cppm` around lines 32 - 49, The
comparison operators (operator==, operator!=, operator>, operator>=, operator<,
operator<=) do not modify the String object and should be marked as const member
functions. Add the const keyword after the parameter list for each of these six
operators to indicate they don't modify the object state and to allow them to be
called on const String instances. Additionally, update the StringCompare method
declaration (referenced on line 31) to also be const since it is called by these
operators and should not require a mutable String instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants