Add UTF-16 Strings#34
Conversation
Add UTF-16 Strings
📝 WalkthroughWalkthroughAdds Unicode type aliases ( ChangesUTF-16 String Module and Prerequisites
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 503-507: The WriteInt function has two issues: (1) negating
minimum signed integer values like INT_MIN causes undefined behavior, and (2)
GrowCapacity return values are not checked for failures, potentially leading to
out-of-bounds writes. In the signed value negation block (checking if val < 0),
use a safer approach such as converting to an unsigned type, using absolute
value safely, or checking for the minimum value before negating. Additionally,
wherever GrowCapacity is called within WriteInt, verify its return value
indicates success before continuing with the write operation; if allocation
fails, return an appropriate error code or handle the failure gracefully to
prevent buffer overruns.
- Around line 351-368: The `StringBuilder::Create` method has a stack-allocated
`fixed_allocator` variable that is used to initialize the `allocator` member,
but the allocator stores a pointer to this local variable which becomes invalid
after the function returns. To fix this, the `fixed_allocator` must be stored as
a member of the `StringBuilder` struct itself to ensure it persists for the
lifetime of the object, allowing subsequent growth and free operations to safely
access it without dereferencing dangling memory.
- Around line 437-446: The StringBuilder::Write methods call GrowCapacity with
only the incoming string length, not accounting for the existing buffer size.
Change the GrowCapacity calls from GrowCapacity(length) to GrowCapacity(size +
length) in both Write overloads (the utf16 version and the other overload
mentioned as also applying to lines 449-457) to ensure the total required
capacity is correctly calculated when appending to a non-empty builder.
- Around line 228-233: The Substr method incorrectly rejects valid substring
requests by returning EMPTY_STRING whenever the requested range touches or
exceeds the string boundary. Instead of returning EMPTY_STRING when pos + len >=
size, check only if pos >= size (completely out of bounds). When pos is valid
but the requested length extends beyond the string end, truncate len to size -
pos so the method returns the available substring. This truncation contract fix
will also resolve the false negatives in StartsWith and EndsWith methods that
depend on Substr.
- Around line 320-324: The `StringBuilder::Create` method has a unit mismatch
where the `capacity` parameter represents UTF-16 elements but is passed directly
to `allocator.alloc` as a byte size without conversion. Multiply `capacity` by
the size of a single UTF-16 element (sizeof(utf16)) when calling
`allocator.alloc` to correctly convert from element count to byte count. This
ensures the allocated storage is large enough to hold the requested number of
UTF-16 characters without under-allocating and risking buffer overflow.
- Around line 371-385: The Reserve method in StringBuilder has two issues: the
memcpy call copies size elements unconditionally even when newCapacity is
smaller, risking buffer overflow, and the oldDst.size field is set as an element
count rather than bytes, causing the allocator.free call to receive incorrect
size units. Fix the memcpy to copy only the minimum of size and newCapacity
elements (to avoid overflow), and set oldDst.size to capacity * sizeof(utf16)
instead of just capacity to ensure the allocator.free call receives the size in
bytes as expected.
- Around line 182-205: The StringCompare method has two critical bugs: the
min_size variable is initialized using itself as arguments to draco::math::min
instead of using the actual size members of both strings, which causes undefined
behavior; additionally, the null-pointer checks don't handle the case where both
string texts are null, causing two empty strings to incorrectly compare as
unequal. Fix the min_size initialization by comparing the size member of the
current string with the size member of the parameter string s, and add a check
at the beginning of the method to return 0 (equal) when both text pointers are
null, ensuring that EMPTY_STRING == EMPTY_STRING evaluates correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f9a636d9-6f20-4bfc-a1a3-f770f0cfaa40
📒 Files selected for processing (6)
engine/native/core/CMakeLists.txtengine/native/core/containers/containers.cppmengine/native/core/containers/string.cppmengine/native/core/definitions/definitions.cppmengine/native/core/definitions/stdtypes.cppmengine/native/core/math/functions.cppm
|
The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files. |
The constexpr items and templates need to be in the module, I'll see what @OldDev78 says about the .cpp implementation to see if it would make sense to put the very few StringBuilder functions into a .cpp file |
If a function serves best as template and/or can be effectively constrexpr (can be constant evaluated), then it's okay to be implemented in the .cppm. Otherwise, the implementation should just be separated into the .cpp. Take note that my biggest concern is to expose dependency on headers as little as possible (so, when I can, also move the includes to the .cpp). |
|
The integral concept is a merge conflict for me :/ |
| return result; | ||
| } | ||
|
|
||
| bool constexpr String::operator==(String s) { |
There was a problem hiding this comment.
Especially when the function is this simple, I´d rather it be inlined in the declaration above.
| /// Initialize with preallocated buffer with initial capacity | ||
| /// | ||
| /// @param allocator Initalized allocator that allows for string builder to grow | ||
| /// @param buffer Preallocated buffer that was created with allocator | ||
| /// @param capacity The initial capacity that buffer has before it needs to be reallocated | ||
| memory::Error Create(Allocator allocator, utf16 *buffer, isize capacity); | ||
|
|
||
| /// Initialize with preallocated constant sized buffer, size in number of elements | ||
| /// | ||
| /// @param buffer Preallocated buffer with no access to allocations | ||
| /// @param bufferMaxCapacity Max Capacity of preallocated buffer | ||
| memory::Error Create(utf16 *buffer, isize bufferMaxCapacity); |
There was a problem hiding this comment.
These constructor functions aren't needed. A fixed allocator already provides this functionality by wrapping a fixed block of memory and can be abstracted to an Allocator type. It just returns the whole block of memory and aligns it for you. The abstraction is there to avoid duplicates like this.
| // Initializes with initial capacity | ||
| memory::Error StringBuilder::Create(Allocator allocator, isize capacity) { | ||
| // TODO(Jon) Assert capacity is greater than 0 | ||
| Slice dst; | ||
| memory::Error err = allocator.alloc(&dst, capacity, alignof(utf16)); |
There was a problem hiding this comment.
Could this initial allocation be defered to the first write? It would make this function error free.
There was a problem hiding this comment.
Yeah I can change the design to work with that. I was initially concerned over capacity if it starts out at 0 but that should be fine as all of the write functions ensure it adds a minimum amount, which I didn't know at the time as I wrote this function first
|
|
||
| memory::Error StringBuilder::Write(rune r) { | ||
| if (r > 0x10FFFF || (r >= 0xD800 && r <= 0xDFFF)) { | ||
| return Error::Other; |
There was a problem hiding this comment.
Maybe another error type unioned with the allocator error type would be a better approach. It would allow for more granularity in this interface.
| static constexpr utf16 hexadecimalNibbles[] = { | ||
| u'0', | ||
| u'1', | ||
| u'2', | ||
| u'3', | ||
| u'4', | ||
| u'5', | ||
| u'6', | ||
| u'7', | ||
| u'8', | ||
| u'9', | ||
| u'A', | ||
| u'B', | ||
| u'C', | ||
| u'D', | ||
| u'E', | ||
| u'F', | ||
| }; |
There was a problem hiding this comment.
This could be a base64 table and reused for all bases less than 64 (I wouldn't worry about padding equals signs here). Unfortunately, the ascii table isn't laid out so conveniently. It'd also give the last case the utility of all bases supported by the table.
There was a problem hiding this comment.
This one is probably the least important note here.
There was a problem hiding this comment.
Yeah this can be a later addition if we need it
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
engine/native/core/containers/string.cppm (2)
349-354:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winInfinite loop: base-2 loop condition checks
valinstead ofunsignedVal.The loop condition
while (val != 0)checks the original (unchanged) parameter, but the loop body only modifiesunsignedVal. Sincevalis never updated, this creates an infinite loop for any non-zero input when using base 2.The base-16 (line 356) and base-10 (line 361) cases correctly use
unsignedVal != 0.Suggested fix
} else if (base == 2) { - while (val != 0) { + while (unsignedVal != 0) { std::memcpy(buff + index, binaryNibbles[unsignedVal & 0xF], sizeof(binaryNibbles[0])); index += 4; unsignedVal >>= 4; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@engine/native/core/containers/string.cppm` around lines 349 - 354, The base-2 conversion loop condition checks the parameter `val` instead of `unsignedVal`, causing an infinite loop since `val` is never modified while only `unsignedVal` is updated in the loop body. Change the `while (val != 0)` condition in the `else if (base == 2)` block to `while (unsignedVal != 0)` to match the correct behavior used in the base-16 and base-10 cases.
194-197:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPotential integer overflow when sizes differ significantly.
The cast
(int)(size - s.size)can overflow when the size difference exceedsINT_MAX(e.g., comparing a 3-billion-element string against an empty one). The result would be incorrect comparison semantics.Suggested fix
if (result == 0) { - return (int)(size - s.size); + if (size == s.size) return 0; + return size < s.size ? -1 : 1; } - return result; + return result < 0 ? -1 : 1;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@engine/native/core/containers/string.cppm` around lines 194 - 197, The cast (int)(size - s.size) in the return statement can overflow when the size difference exceeds INT_MAX, causing incorrect comparison results for strings with significantly different lengths. Fix this by safely comparing the sizes without casting a potentially large size_t difference to int. Instead, check if size is greater than s.size and return 1 if true, otherwise return -1 when sizes differ, to maintain proper comparison semantics without overflow risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 32-49: The comparison operators (operator==, operator!=,
operator>, operator>=, operator<, operator<=) do not modify the String object
and should be marked as const member functions. Add the const keyword after the
parameter list for each of these six operators to indicate they don't modify the
object state and to allow them to be called on const String instances.
Additionally, update the StringCompare method declaration (referenced on line
31) to also be const since it is called by these operators and should not
require a mutable String instance.
---
Outside diff comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 349-354: The base-2 conversion loop condition checks the parameter
`val` instead of `unsignedVal`, causing an infinite loop since `val` is never
modified while only `unsignedVal` is updated in the loop body. Change the `while
(val != 0)` condition in the `else if (base == 2)` block to `while (unsignedVal
!= 0)` to match the correct behavior used in the base-16 and base-10 cases.
- Around line 194-197: The cast (int)(size - s.size) in the return statement can
overflow when the size difference exceeds INT_MAX, causing incorrect comparison
results for strings with significantly different lengths. Fix this by safely
comparing the sizes without casting a potentially large size_t difference to
int. Instead, check if size is greater than s.size and return 1 if true,
otherwise return -1 when sizes differ, to maintain proper comparison semantics
without overflow risk.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 99d10637-fa46-476e-b1ad-5f8fdfb9ad32
📒 Files selected for processing (2)
engine/native/core/containers/string.cppengine/native/core/containers/string.cppm
| bool constexpr operator==(String s) { | ||
| return StringCompare(s) == 0; | ||
| } | ||
| bool constexpr operator!=(String s) { | ||
| return StringCompare(s) != 0; | ||
| } | ||
| bool constexpr operator>(String s) { | ||
| return StringCompare(s) > 0; | ||
| } | ||
| bool constexpr operator>=(String s) { | ||
| return StringCompare(s) >= 0; | ||
| } | ||
| bool constexpr operator<(String s) { | ||
| return StringCompare(s) < 0; | ||
| } | ||
| bool constexpr operator<=(String s) { | ||
| return StringCompare(s) <= 0; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Comparison operators should be const member functions.
These operators don't modify the String object, so they should be marked const to allow calling them on const String instances and to communicate immutability intent.
Suggested fix
- bool constexpr operator==(String s) {
+ bool constexpr operator==(String s) const {
return StringCompare(s) == 0;
}
- bool constexpr operator!=(String s) {
+ bool constexpr operator!=(String s) const {
return StringCompare(s) != 0;
}
- bool constexpr operator>(String s) {
+ bool constexpr operator>(String s) const {
return StringCompare(s) > 0;
}
- bool constexpr operator>=(String s) {
+ bool constexpr operator>=(String s) const {
return StringCompare(s) >= 0;
}
- bool constexpr operator<(String s) {
+ bool constexpr operator<(String s) const {
return StringCompare(s) < 0;
}
- bool constexpr operator<=(String s) {
+ bool constexpr operator<=(String s) const {
return StringCompare(s) <= 0;
}Also update StringCompare declaration on line 31:
- int constexpr StringCompare(String s);
+ int constexpr StringCompare(String s) const;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@engine/native/core/containers/string.cppm` around lines 32 - 49, The
comparison operators (operator==, operator!=, operator>, operator>=, operator<,
operator<=) do not modify the String object and should be marked as const member
functions. Add the const keyword after the parameter list for each of these six
operators to indicate they don't modify the object state and to allow them to be
called on const String instances. Additionally, update the StringCompare method
declaration (referenced on line 31) to also be const since it is called by these
operators and should not require a mutable String instance.
Add UTF-16 Strings
Summary by CodeRabbit
StringBuilderwith capacity management and APIs to write UTF-16 text, runes, and formatted integers.utf8,utf16,utf32) and alignedrunefor consistent Unicode handling.min/maxhelpers for arithmetic types.