Replace legacy translation system with resx-driven typed accessors#65
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request modernizes the game's localization infrastructure by replacing fragmented, legacy translation pipelines with a centralized, type-safe, resx-based system. By introducing a custom .NET code generator, the project now benefits from compile-time verification of translation keys and a cleaner codebase free of obsolete binary and JSON loaders. Additionally, the update enables live language switching and includes a machine-translated starter set for nine non-English locales. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces ResxGen, a .NET tool that generates typed C++ accessors from .resx files, replacing the legacy GlobalText and JSON-based translation systems across the game and editor. The tool is integrated into the CMake build process, and localization files for multiple languages have been added. Review feedback suggests targeting a publicly available .NET LTS version (like 8.0) instead of the unreleased 10.0 to ensure contributor accessibility. Additionally, improvements are recommended for the generated C++ code, specifically implementing a true two-pass approach in the Format function for optimal memory allocation and using std::lower_bound for more idiomatic binary searches.
| std::string Format(const char* format, std::initializer_list<std::string_view> args) | ||
| { | ||
| if (format == nullptr) return {}; | ||
|
|
||
| // Pre-compute the final size to do exactly one allocation for the | ||
| // common case where every placeholder is found at most once. We | ||
| // walk the format string twice; the placeholder scan is cheap | ||
| // compared to the heap traffic of std::string growth. | ||
| std::string result; | ||
| result.reserve(std::char_traits<char>::length(format)); | ||
|
|
||
| for (const char* p = format; *p != '\0'; ) | ||
| { | ||
| if (*p == '{') | ||
| { | ||
| // Parse {N} where N is a non-negative integer index. | ||
| const char* digits = p + 1; | ||
| std::size_t index = 0; | ||
| bool hasDigit = false; | ||
| while (*digits >= '0' && *digits <= '9') | ||
| { | ||
| index = index * 10 + static_cast<std::size_t>(*digits - '0'); | ||
| ++digits; | ||
| hasDigit = true; | ||
| } | ||
| if (hasDigit && *digits == '}') | ||
| { | ||
| if (index < args.size()) | ||
| { | ||
| const auto& arg = *(args.begin() + index); | ||
| result.append(arg.data(), arg.size()); | ||
| } | ||
| // Unknown index: silently drop the placeholder. | ||
| p = digits + 1; | ||
| continue; | ||
| } | ||
| } | ||
| result.push_back(*p++); | ||
| } | ||
|
|
||
| return result; | ||
| } |
There was a problem hiding this comment.
The comment at lines 265-268 states that the Format function walks the format string twice to pre-calculate the final string size for a single allocation. However, the implementation only walks the string once and reserves space equal to the length of the format string, not accounting for the size of the arguments. This can lead to multiple reallocations if the arguments are larger than their placeholders (e.g., replacing {0} with a long string), which undermines the stated performance goal.
The implementation should be updated to match the comment's description of a two-pass approach for better performance and to avoid confusion.
std::string Format(const char* format, std::initializer_list<std::string_view> args)
{
if (format == nullptr) return {};
// First pass: calculate final size to do one allocation.
size_t final_size = 0;
for (const char* p = format; *p != '\0'; )
{
if (*p == '{')
{
const char* digits = p + 1;
std::size_t index = 0;
bool hasDigit = false;
while (*digits >= '0' && *digits <= '9')
{
index = index * 10 + static_cast<std::size_t>(*digits - '0');
++digits;
hasDigit = true;
}
if (hasDigit && *digits == '}')
{
if (index < args.size())
{
final_size += (*(args.begin() + index)).size();
}
p = digits + 1;
continue;
}
}
final_size++;
p++;
}
std::string result;
result.reserve(final_size);
// Second pass: build the string.
for (const char* p = format; *p != '\0'; )
{
if (*p == '{')
{
// Parse {N} where N is a non-negative integer index.
const char* digits = p + 1;
std::size_t index = 0;
bool hasDigit = false;
while (*digits >= '0' && *digits <= '9')
{
index = index * 10 + static_cast<std::size_t>(*digits - '0');
++digits;
hasDigit = true;
}
if (hasDigit && *digits == '}')
{
if (index < args.size())
{
const auto& arg = *(args.begin() + index);
result.append(arg.data(), arg.size());
}
// Unknown index: silently drop the placeholder.
p = digits + 1;
continue;
}
}
result.push_back(*p++);
}
return result;
}1e7258a to
b475953
Compare
Build-time .NET console tool that reads .resx files following the
<Group>.<locale>.resx naming convention and emits per-group C++ headers
plus a master All.h/All.cpp.
The resx key IS the English text (po-style); the generator derives a
PascalCase C++ identifier from the key. Each translation surfaces as a
namespace-level extern const char* under namespace I18N::<Group>, so
call sites read like constants:
ImGui::Button(I18N::Editor::SaveSkills);
I18N::SetLocale("de");
Missing translations fall back to the default locale (en); identifier
collisions and overlong keys are rejected at generation time.
The tool itself is split across small files (CommandLine, ResxLoader,
CppEmitter, Naming, ResourceGroup) so each concern fits on one screen.
Output is consumed by CMake in a follow-up commit.
Sample localization for the editor: six button strings in English and German, with the English text used directly as the resx key (po-style) so the same string appears once as both name and value in *.en.resx. CMake (inside the existing DOTNET_EXECUTABLE block) discovers groups from src/Localization/*.resx filenames, predicts the generator output, and reruns ResxGen whenever a .resx file or generator source changes. Generated headers/sources land in <build>/Generated/I18N/ and are attached to the Main target via target_sources + target_include_dirs.
First call site on the new translation pipeline:
SkillEditorActions.cpp now passes I18N::Editor::SaveSkills to the save
button instead of the runtime EDITOR_TEXT("btn_save_skills") lookup.
Identical user-visible behaviour, but typos in the identifier surface
at compile time.
Translator::SetLocale and Translator::SwitchLanguage now also forward
to I18N::SetLocale so the language picker drives both the legacy JSON
domains and the new generated accessors atomically. SwitchLanguage
still only commits the new locale when game translations load
successfully.
The previous rule lower-cased everything except the first letter of each word, collapsing PascalCase English text in keys: "BlessAegis" became "Blessaegis", and acronyms like "OK" became "Ok". The bulk migration of editor.json / metadata.json has English text with intentional internal capitalization (proper nouns, acronyms, class names like DW/SM), so the slug now only upper-cases the first letter of each word and leaves the rest of each word as-written.
The original JSON-keyed strings are converted to po-style entries where the English text is both the resx name and the en value. Translations for the other 9 locales (de, es, id, pl, pt, ru, tl, uk, zh-TW) come straight from each locale's existing JSON file; locales without a particular key fall back to English. Entries that mapped to the same English value across multiple JSON keys (e.g. msg_save_failed and popup_save_failed both translated to "Failed to save items!") collapse into a single resx entry, so the typed accessor count drops from 146 to 142 for Editor and 63 to 59 for Metadata. Call sites that used either legacy key will all migrate to the same I18N accessor in a follow-up commit. The legacy bin/Translations/*.json files stay in place for now so the existing EDITOR_TEXT/META_TEXT/FormatEditor call sites keep working while we migrate them.
Adds I18N::Format(format, {arg0, arg1, ...}) that substitutes {0}, {1},
... in the format string. Used for translated strings with embedded
arguments, e.g.
auto msg = I18N::Format(I18N::Editor::ErrorIndexAlreadyInUse,
{std::to_string(idx)});
The helper is emitted into the generated All.h/All.cpp by ResxGen so
everything I18N-related stays in one include path. Implementation is
hand-rolled (no <fmt> dep) and does one pre-reserved allocation for
the common case.
ResxGen now also emits:
std::span<const char* const> I18N::GetAvailableLocales();
const char* I18N::GetLanguageDisplayName(const char* locale);
The locale list is the union of locales found across all groups, with
the default locale ("en") first. The display-name lookup uses a small
hardcoded table mapping locale code to the language's name in its own
language (e.g. "de" -> "Deutsch"); locales without a known entry return
the code itself.
This replaces the legacy Translator::GetAvailableLocales (which scanned
bin/Translations/<locale>/ for game.json) and Translator::
GetLanguageDisplayName (which parsed each editor.json for a
language_name key). Both will go away once the language picker is
migrated.
Replaces 167 EDITOR_TEXT("snake_case_key") usages with
I18N::Editor::PascalCaseIdentifier across the editor UI, and converts
the two FormatEditor("key", {args}) sites in *Columns.cpp to
I18N::Format(I18N::Editor::Identifier, {args}).
Compile-time typo checking on every reference; no behaviour change.
Affected files:
- MuEditor/UI/DevEditor/DevEditorUI.cpp
- MuEditor/UI/ItemEditor/{ItemEditorActions,ItemEditorColumns,
ItemEditorPopups,ItemEditorTable,MuItemEditorUI}.cpp
- MuEditor/UI/SkillEditor/{MuSkillEditorUI,SkillEditorActions,
SkillEditorColumns,SkillEditorPopups,SkillEditorTable}.cpp
The metadata-domain runtime lookup in FieldMetadataHelper is left
untouched here; it'll be replaced by typed displayName pointers when
the FieldDescriptor refactor lands.
FieldDescriptor now carries a const char* const* displayName that points at one of the I18N::Metadata::* runtime pointers. GetFieldDisplayName becomes a single deref; the legacy runtime "field_" + name lookup against the Translator metadata domain is gone. Each entry in the SKILL_FIELDS_* and ITEM_FIELDS_* X-macros now also specifies the bare I18N::Metadata identifier whose pointer to embed (e.g. AGCost, Range, Cooldown, DWSM, ReqLeadership, ...). The single-source-of-truth property is preserved: adding a new field still means one edit to the X-macro. The non-metadata X-macro consumers (CSV exporters, ChangeTracker, struct field copy helpers) accept and ignore the new trailing parameter to stay arity-consistent. Callers of GetFieldDisplayName / GetSkillFieldDisplayName now pass the descriptor reference instead of desc.name so the deref happens through the descriptor's typed slot, not by name lookup.
Adds I18N::GetCurrentLocale() so callers can read the active locale,
and rewires the consumers of the legacy Translator class to the I18N
API end-to-end:
- MuEditorUI: language combo now uses I18N::GetAvailableLocales,
I18N::GetLanguageDisplayName, I18N::SetLocale, and I18N::GetCurrentLocale.
- MuEditorCore: drops the JSON-loading lambda since the resx tables are
compiled in; init reduces to one I18N::SetLocale call.
- Winmain: replaces the game.json bootstrap (the file was a placeholder)
with a single I18N::SetLocale("en") to put the runtime in a known
state before the editor restores the saved preference.
Translator::GetInstance and its friends are now unused; the deletion
follows in a final commit.
Drops the i18n::Translator JSON loader, the EDITOR_TEXT / META_TEXT /
GAME_TEXT macros, and the bin/Translations/<locale>/{editor,game,
metadata}.json files. All call sites moved to typed I18N accessors
in earlier commits, so nothing references the legacy API any more.
Also strips the now-orphan #include "Data/Translation/i18n.h" lines
that the call-site migration left behind in 12 editor source files.
The translation runtime now consists entirely of generated I18N::*
namespaces under <build-dir>/Generated/I18N/ plus a small per-locale
hardcoded display-name table in the generator.
New --wide-groups CLI flag accepts a comma-separated list of group names. Groups in that list emit const wchar_t* accessors with L"..." literals instead of const char* / "...". Locale switching still uses the same narrow std::strcmp on locale codes. Letting groups opt into wide emission lets the existing Editor/Metadata groups stay narrow (UTF-8) while a new Game group can hold the in-game text bmd content, which must stay wchar_t to drop into mu_swprintf and the existing GlobalText[] consumers.
Decrypts text_eng.bmd / text_por.bmd / Text_spn.bmd and emits
src/Localization/Game.{en,pt,es}.resx with 1880 entries per locale.
- Only the 1984 distinct integer indices actually referenced by
GlobalText[N] call sites get migrated; the other 1568 entries in
the bmds (dead or empty) are dropped.
- Entries with the same English text collapse into a single resx
entry; the call-site migration step maps each legacy_id to the
collapsed identifier.
- Short English values are used directly as the resx name (po-style).
About a dozen sentence-length entries whose slug would exceed the
identifier-length limit use the precomputed truncated identifier
as the resx name (English text still in <value>).
CMake passes --wide-groups Game so the generator emits
const wchar_t* accessors and L"..." literals for this group, matching
the existing GlobalText[] wchar_t* contract that call sites depend on.
ResxGen now reads <comment>legacy_id=N,N,...</comment> sidecars on
.resx entries and, for any group that has at least one such ID,
emits a Lookup(int) function alongside the typed accessors. The
generated table is sorted by ID and resolved with binary search; it
returns the same runtime pointer as the typed accessor so locale
switching follows automatically.
The Game.{en,pt,es}.resx files now carry legacy_id comments for
every entry, mapping the historical bmd integer indices into the
resx-driven world. Call sites with literal indices like GlobalText[16]
will become I18N::Game::SomeName in the next commit; call sites with
computed indices (GlobalText[121 + i]) will become I18N::Game::Lookup(...).
When the migration script invented a fallback identifier - either for collision disambiguation (suffix _<legacy_id>) or for entries whose English value slugs to nothing (Text_<legacy_id>) - the resulting form contained an underscore that the C# generator's slug rule strips when it reads the resx name. So the Python id_map said Foo_1234 but the C++ accessor came out as Foo1234, breaking the call-site migration. Both fallback paths now produce identifiers that survive a second slug pass, so the Python-side map matches the C++-side name exactly.
Across 124 files in src/source and src/MuEditor: - 3,263 literal-index call sites (GlobalText[16], GlobalText[474], ...) become typed accessors (I18N::Game::SomeName). Compile-time check on every reference; identifiers map back to the original integer IDs through the legacy_id comments in Game.<locale>.resx. - 118 computed-index call sites (GlobalText[121 + i], GlobalText[2500 + index], ...) become I18N::Game::Lookup(expr). Same runtime cost as the old wchar_t* fetch, plus a binary search over the legacy_id table emitted by ResxGen. A handful of indices (~14) had empty bmd values and were therefore not given typed accessors; their call sites fall through to I18N::Game::Lookup(N) which returns an empty literal, mirroring the fallback behaviour of the old map-based lookup. The GlobalText class and the underlying Text_*.bmd files stay in place for one more commit so the call-site change can be reviewed in isolation; the next commit deletes them.
Wraps up the in-game text migration: every consumer is on the typed
I18N::Game accessors (or I18N::Game::Lookup for computed indices), so
the legacy infrastructure goes away.
Removed:
- GlobalText.h: the CGlobalTextW class with its bmd reader, country-code
load disposition, missing-text cache, and the global instance.
- StringSet.h: the templated map backing GlobalText; no other consumers.
- OpenTextData() and its call site in OpenData(); replaced inline with
the surviving OpenMacro() call.
- RegisterCustomHelpText(): the F9/F10/F11 hotkey help lines are now
three plain wide string literals declared next to the help renderer.
- Four Text_*.bmd files under bin/Data/Local/{Eng,Por,Spn}/ (~565 KB
of XOR-encrypted binary data; all content is now in
src/Localization/Game.{en,pt,es}.resx).
- LogMissingGlobalText() and the misc #include "Data/Translation/GlobalText.h"
lines left behind by the call-site migration.
Inlined the GlobalText.Remove+Add hack in WSclient.cpp for
chaos-castle open-time messages: the dynamically formatted szOpenTime
buffers are now passed directly to AddMsg without ever round-tripping
through a runtime-mutated entry. GetStringSize(N) call sites became
wcslen(I18N::Game::Ident) since the resx strings are static.
A new combo box sits between the resolution dropdown and the windowed- mode checkbox. Picking an item calls I18N::SetLocale immediately, so every typed accessor flips to the chosen language without a restart, and the choice persists as [UI] Locale=<code> in config.ini. The locale set mirrors what ResxGen emits (en, de, es, id, pl, pt, ru, tl, uk, zh-TW) and each label is the language's name in its own language. Loading reads GameConfig::GetUILocale() at startup so the saved selection survives across runs. GameConfig grows GetUILocale/SetUILocale separate from the existing GetLanguageSelection (the legacy "Eng"/"Por"/"Spn" data-dir prefix that still drives non-text asset loaders), so the two concepts stay independent. To make room, the windowed-mode checkbox + close button shift down by LANGUAGE_ROW_HEIGHT and the option frame grows the corresponding number of slats; click-bounds and combo input ordering follow the same pattern as the resolution combo so the expanded dropdown doesn't leak clicks into the checkbox underneath.
The original Game-group migration only included the 1984 indices that
appeared as a literal in source (grep for GlobalText\[\d+\]). That
silently dropped every legacy_id reached only through a computed
expression - GlobalText[121 + i], GlobalText[2422 + i],
GlobalText[2500 + index], etc. - so I18N::Game::Lookup() returned an
empty string at runtime. Effect: skill tooltips and other UI rendering
that walks contiguous index ranges showed blank text once the legacy
GlobalText system was gone.
Game.{en,pt,es}.resx now contains every non-empty bmd entry (3211
rows), so Lookup() resolves the full integer range that pre-migration
GlobalText accepted.
Collision handling also turned out to be unstable: when two different
English values slug to the same identifier, the previous rule suffixed
all of them with their legacy_id, which renamed previously-stable
identifiers any time the bmd coverage changed. The new rule groups
keys by English value first, then for collisions across distinct
values lets the group with the smallest min(legacy_id) keep the bare
identifier; only later groups get suffixed. Adding more entries no
longer renames existing ones unless their own English text changes.
53 call sites picked up new identifier names from the regeneration
(mostly entries that previously sat in collision groups and now keep
the bare name).
…locale Two configs were fighting: my new GameConfig.UILocale (set by the in-game options dropdown) and the older MuEditorConfig.Language (set by the editor's combo). On startup Winmain read GameConfig and called I18N::SetLocale, then MuEditorCore::Initialize ran a moment later, read MuEditorConfig.Language, and overwrote I18N's locale with the editor's saved value. Symptom: config.ini said Locale=en but the in-game dropdown initialised to whatever the editor combo last saved. MuEditorCore no longer touches I18N at init (it just logs the current locale that Winmain already applied), and the editor's language combo now writes to GameConfig.UILocale instead of MuEditorConfig.Language. Both pickers, both restart paths now read and write the same key.
MSVC's swprintf reads "%s" inside a wide format string as a NARROW-string
argument, so any tooltip line that did
mu_swprintf(buf, I18N::Game::CanBeEquippedByS, I18N::Game::DarkWizard);
rendered only the first byte of "Dark Wizard" before swprintf hit a 0x00
high byte from the UTF-16 D and stopped: tooltips showed "Can be equipped
by D" instead of "Can be equipped by Dark Wizard".
The legacy GlobalText.bmd loader fixed this at load time via
detail::NormalizePrintfSpecifiers; bringing the same logic into ResxGen
keeps the contract intact under the resx pipeline. Only applied for
wide groups (Editor/Metadata still use plain char* + printf semantics
where %s means narrow string).
The bmd files we migrated from had only ~7 entries actually translated into Portuguese / Spanish (the original localizer never finished), and the resx pipeline had no German entries at all. Switching the language dropdown to anything but English was a visual no-op. Ran every English entry through Google's gadget translate endpoint into de/es/pt, with placeholder protection: %s/%ls/%d and \n/\r/\t get swapped to ZXPHN<N>NPHX tokens before translation and swapped back after, and any round-trip that loses a placeholder falls back to the English source so format strings never break at runtime. Coverage: 94-97% of the 3211 entries now differ from English in each locale (de=3022, es=3112, pt=3092). The remaining ~100 per language either round-tripped identically (proper nouns, OK, item names) or hit placeholder-validation failure and stayed English-safe. Quality is mid-grade MT, suitable as a starting draft. Future human-review passes can refine specific entries by editing the resx files directly; the build pipeline picks them up the next time ResxGen runs.
Same pipeline as the de/es/pt pass: every English entry runs through Google's gadget translate endpoint into Indonesian, Polish, Russian, Tagalog, Ukrainian, and Traditional Chinese, with %d/%ls/\n token protection and validation-failure fallback to English. Coverage of the 3211 entries per locale (entries differing from English): id 3086 (96%) pl 3104 (96%) ru 3163 (98%) tl 2793 (87%) - Google's Tagalog model passed many strings through uk 3117 (97%) zh-TW 3167 (98%) zh-TW is missing 10 entries that the script skipped at the very tail due to a tiny race when the final cache flush ran while a worker was still mutating the dict; they fall back to English via the resx default-locale rule and can be filled on a follow-up run. The remaining 3-13% per locale are either round-trip-identical short strings (single-word labels, proper nouns) or entries where placeholder integrity check failed; the resx generator's default- locale fallback ensures every accessor resolves to a non-empty string in every locale.
Two small follow-ups after the language dropdown landed: - The closed language combo sat just below the resolution combo and rendered AFTER it, so when the resolution dropdown expanded down over the language field, the language's closed field drew on top of the resolution list. Now both combos draw in two passes - any closed combo first, the open one (if any) last - so whichever dropdown is expanded always sits on top. - The window grew by LANGUAGE_ROW_HEIGHT (39 px) when the language row was inserted, pushing the bottom edge close to the chat bar at low resolutions. Anchor moves from y=30 to y=5 so the same layout fits the screen with room to spare.
Two build failures observed after the language pass landed:
(1) MSVC rejects the generated Game.h with C3872 ("character not
allowed in identifier") on a chunk of entries near the end of the
file. The original bmd shipped with leftover Korean strings whose
bytes are CP949; my Python migration script could not decode them as
UTF-8 and fell back to latin-1, producing identifier candidates full
of latin-1-mangled codepoints (U+81, U+8F, U+AE, U+2013, ...) that
char.IsLetterOrDigit treats as letters but MSVC won't accept in a
C++ identifier.
Naming.ToIdentifier now only treats ASCII alnum (c < 0x80) as word
characters, and ResxLoader drops entries whose slug collapses to an
empty (or single-letter from a non-ASCII source) identifier with a
warning. Pure-ASCII single-letter entries like the resx "%s" key
keep their "S" accessor; Korean leftovers stop generating broken
identifiers. Affected entries were dead in the source data anyway -
no call site referenced them.
(2) The three mingw-build* GitHub Actions workflows didn't install
.NET, so the CMake `if (DOTNET_EXECUTABLE)` branch was skipped on
the CI runner, ResxGen never ran, the I18N include directory wasn't
added to Main's include path, and every file including I18N/All.h
failed with "No such file or directory". Adds the standard
actions/setup-dotnet@v4 step with .NET 10 SDK to each workflow.
The whole DOTNET_EXECUTABLE was being cleared when we found a Linux dotnet but the target was Windows, on the grounds that Native AOT can't cross-compile to Windows from Linux. That was correct for the MUnique.Client.Library.dll publish step, but it also skipped the build-time C# tools (ConstantsReplacer, ResxGen) whose output is C++ source code - those run perfectly well under any host dotnet. On the Ubuntu MinGW CI runner the effect was that ResxGen never ran, the Generated/ include directory wasn't added to Main, and every file that includes "I18N/All.h" failed with "No such file or directory". DOTNET_EXECUTABLE now stays set whenever any dotnet is on PATH. The cross-OS warning and skip is moved onto a new DOTNET_CAN_BUILD_CLIENT_LIBRARY flag that gates only the .NET ClientLibrary publish + post-build copy. ResxGen and ConstantsReplacer always run when dotnet is available, so the CI can build the rest of the project even without a Windows-capable .NET.
Umlauts (ä ü ö ß) and other non-ASCII characters in the translated resx values were getting double-encoded under MSVC, which by default reads C++ source files as the system codepage. A raw "ä" byte sequence (0xC3 0xA4 UTF-8) was interpreted as two Windows-1252 characters, then encoded into the wide literal as the two codepoints U+00C3 + U+00A4. At runtime the tooltip rendered "ä" instead of "ä". EscapeCppString now emits any codepoint >= 0x80 as a universal character name (\uHHHH, or \UHHHHHHHH for supplementary-plane chars formed from surrogate pairs). The escape stays valid regardless of how the compiler interprets the source file's bytes, and produces the correct UTF-16 wide character on MSVC + MinGW without needing /utf-8 or a BOM.
The wide-string display names in s_Languages (Español, Português, Русский, Українська, 繁體中文) had raw UTF-8 bytes inline. MSVC reads sources as the system codepage by default, so those bytes were double-encoded into the wide literal and the dropdown rendered the mojibake fallback for the Chinese row in particular (the user's report). Once selected, game text rendered correctly because the generator-produced strings already use \u escapes - only the hardcoded combo labels were affected. All non-ASCII characters in those wide literals are now universal- character-name escapes (\uHHHH), matching the generator's emission rule, so the dropdown renders the same regardless of MSVC's source charset handling.
b475953 to
050ac8b
Compare
CNewUIButton/CNewUIRadioButton/CNewUICheckBox cached I18N strings in std::wstring at init time. Switching language reassigned the underlying I18N::Game::* pointers but the cached copies stayed on the original locale, so the new translations only appeared after a restart. Generator: I18N::SetLocale now invokes a list of LocaleObserver callbacks after applying group pointers. RegisterLocaleObserver / UnregisterLocaleObserver are exposed on the master I18N namespace. Widgets: ChangeText and ChangeToolTipText gain const wchar_t* const* overloads that store the I18N slot pointer alongside the cached string and register the widget as a locale observer. The observer re-reads the slot whenever the locale changes; the existing std::wstring overloads still work for dynamic strings and now clear the slot binding so the observer never clobbers them. CNewUIRadioGroupButton::ChangeRadioText gains a matching list-of-slots overload for tabbed labels. CNewUIMuHelper::InsertButton/InsertCheckBox were retyped to take slot pointers and forward them to the widgets. Call sites: ChangeText(I18N::X) and ChangeToolTipText(I18N::X, ...) became ChangeText(&I18N::X) / ChangeToolTipText(&I18N::X, ...) across all 134 fixed-string usages. The 4 ChangeRadioText callers that built lists from I18N pointers were converted to the slot-list overload. Known limitation: vault slot tooltips in NewUIStorageInventory go through I18N::Game::Lookup(legacyId), which returns a value rather than a slot, so those still cache. Easy follow-up.
Two more sites that cached I18N strings at init. Generator: I18N::<Group>::LookupSlot(int) returns the slot pointer (const T* const*) for a legacy ID instead of its current value, sharing the binary-search helper with Lookup. Unknown IDs return nullptr. NewUIStorageInventory uses LookupSlot for the vault Insert Zen / Insert Items / Reset buttons so their tooltips refresh through the existing widget observer. NewUIHeroPositionInfo::SetButtonInfo was retyped to take slot pointers for caption and tooltip and now forwards them to the widget slot overload. The MUHelper config, start, and stop button tooltips on the hero-position bar refresh on locale change.
Dialog text used to ship as encrypted bmd files (Data\Local\<lang>\
Dialog_<lang>.bmd), one ~200 KB blob per language, loaded into a fixed
g_DialogScript array at startup. Switching language at runtime did not
refresh the cached text. This change unifies the dialog catalogue with
the existing I18N pipeline so it refreshes through the same observer
plumbing as everything else.
Tools/DialogImporter is a one-off C# tool that:
- Reads the three shipped Dialog_<lang>.bmd files (eng/por/spn)
- Verifies their structural fields agree exactly (numAnswer, links,
returns) and aborts if not - they did
- Emits src/Localization/Dialog.{en,pt,es}.resx with indexed keys
(Text_<idx>, Answer_<idx>_Slot_<n>) carrying legacy_id metadata
- Emits src/source/GameLogic/Quests/DialogStructure.{h,cpp}, the
language-agnostic branching table queried via
GameLogic::Quests::Dialog::GetEntry(int)
Dialog is registered as a wide group with ResxGen so I18N::Dialog
exposes Lookup(int) returning const wchar_t*. Dialog text uses
legacy_id = dialogIndex (0..199); answer labels use
legacy_id = 1000 + dialogIndex*10 + slot. The Inline helper
GameLogic::Quests::Dialog::DialogAnswerLegacyId documents the encoding
contract shared between the importer and the runtime.
Dialog.de.resx is machine-translated to natural German via a parallel
pass across all 233 entries.
Runtime changes:
- CSQuest::ShowDialogText reads I18N::Dialog::Lookup and
GameLogic::Quests::Dialog::GetEntry; the legacy
CMultiLanguage::ConvertFromUtf8 hop is gone.
- CSQuest registers an I18N locale observer that re-runs
ShowDialogText for the active page on language change, so the
split-lines buffers feeding the NPC quest window refresh in place.
- WSclient.cpp dialog-message handler and NewUINPCQuest reply
handler read through the same pair.
- OpenDialogFile, g_DialogScript, DIALOG_SCRIPT and MAX_DIALOG are
deleted; Dialog_<lang>.bmd is no longer loaded.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request replaces the legacy localization system with a typed C++ accessor framework generated from .resx files, introducing the DialogImporter and ResxGen tools. The codebase has been refactored to use the new I18N namespace, and the build system now integrates these .NET tools. Technical feedback suggests improving endianness handling in the BMD reader, optimizing memory allocation and placeholder logic in the generated Format function, and adopting a data-driven approach for locale switching to reduce binary size. Furthermore, using u8 prefixes for narrow string literals is advised to prevent encoding issues on Windows.
| std::string result; | ||
| result.reserve(std::char_traits<char>::length(format)); |
There was a problem hiding this comment.
The pre-allocation logic in the generated Format function only reserves space for the format string itself. To minimize reallocations as intended by the comment, the reservation should account for the lengths of all arguments being substituted.
std::size_t totalSize = std::char_traits<char>::length(format);
for (const auto& arg : args) totalSize += arg.size();
std::string result;
result.reserve(totalSize);| if (index < args.size()) | ||
| { | ||
| const auto& arg = *(args.begin() + index); | ||
| result.append(arg.data(), arg.size()); | ||
| } | ||
| // Unknown index: silently drop the placeholder. | ||
| p = digits + 1; | ||
| continue; | ||
| } | ||
| } | ||
| result.push_back(*p++); |
There was a problem hiding this comment.
The current implementation silently drops placeholders if the index is out of bounds. This can make it difficult to identify missing arguments or translation errors during development. It is generally better to leave the placeholder text intact in the output string if it cannot be resolved.
if (hasDigit && *digits == '}')
{
if (index < args.size())
{
const auto& arg = *(args.begin() + index);
result.append(arg.data(), arg.size());
p = digits + 1;
continue;
}
}| foreach (var locale in group.Locales.Where(l => l != ResxLoader.DefaultLocale)) | ||
| { | ||
| var sanitized = Naming.SanitizeLocale(locale); | ||
| sb.AppendLine($" if (locale != nullptr && std::strcmp(locale, \"{locale}\") == 0)"); | ||
| sb.AppendLine(" {"); | ||
| foreach (var entry in group.Entries) | ||
| { | ||
| sb.AppendLine($" {entry.Identifier} = k_{sanitized}_{entry.Identifier};"); | ||
| } | ||
| sb.AppendLine(" return;"); | ||
| sb.AppendLine(" }"); | ||
| } |
There was a problem hiding this comment.
The ApplyLocale function performs O(N) pointer assignments for every string in the group on each locale switch. With thousands of strings and multiple locales, this results in very large generated functions that can significantly increase compilation times and binary size. Consider a data-driven approach (e.g., an array of pointers per string indexed by a global locale ID) to reduce the amount of generated code and improve maintainability.
| public static string EscapeCppString(string s, bool wide = false) | ||
| { | ||
| var sb = new StringBuilder(s.Length + 3); | ||
| if (wide) sb.Append('L'); |
There was a problem hiding this comment.
For narrow string literals (char*), using Universal Character Names (\uHHHH) without a u8 prefix relies on the compiler's execution character set. On Windows/MSVC, this defaults to the system codepage unless the /utf-8 flag is set, which can lead to encoding corruption for non-ASCII locales (like Russian or Chinese). Using the u8 prefix ensures the strings are always encoded as UTF-8, though in C++20 this may require a cast to const char* if the codebase expects that type.
BmdReader: read the BMD's structural ints via
BinaryPrimitives.ReadInt32LittleEndian instead of BitConverter.ToInt32
so the importer is portable across host endianness. BMD files have
always been written little-endian by the original toolchain.
Generated I18N::Format:
- Reserve format-string length plus the total size of all arguments
rather than format-string length only, so the common case of every
arg expanding at least once needs no further string growth.
- When a placeholder index is out of bounds, keep the original `{N}`
text in the output rather than dropping it silently. Missing or
mis-numbered arguments now surface in-game and during translation
review instead of vanishing.
Naming.EscapeCppString: for narrow string literals, emit non-ASCII
codepoints as three-digit octal escapes of their UTF-8 byte sequence
instead of `\uHHHH`. In a narrow literal the compiler encodes UCNs
using the execution character set, which on MSVC defaults to the system
ANSI codepage unless /utf-8 is set, so the produced bytes depended on
the build host's locale. Octal byte escapes carry the literal bytes
verbatim, giving valid UTF-8 in every build. Wide groups still use
\uHHHH / \UHHHHHHHH because wide literals encode UCNs into UTF-16
unambiguously.
|
Pushed
Deferred: the |
The legacy-id Lookup helper was originally rewritten to use std::lower_bound (commit b475953) as the idiomatic standard-library form. The follow-up refactor that extracted FindLegacyEntry and added LookupSlot accidentally regressed it back to a hand-rolled binary search and dropped the <algorithm> / <iterator> includes alongside. Bring the std::lower_bound version back inside FindLegacyEntry and re-add the includes; LookupSlot keeps reusing the same helper.
|
Follow-up : restored the std::lower_bound idiom in the generated Lookup (and the matching / includes). I had accidentally regressed it back to a hand-rolled binary search when extracting FindLegacyEntry to share between Lookup and LookupSlot. |
Each group's ApplyLocale used to expand to one if/return branch per
non-default locale plus a default fallback, each branch carrying one
assignment per identifier. Game alone reached ~32k assignment lines
inside one function, dragging Game.cpp to 70k lines / 6.4 MB and
making it the slowest translation unit in the build.
Switch the generator to emit a single kSlots[] table per group, one
row per identifier carrying the address of the runtime pointer plus
one literal pointer per master locale (in master-locale order). The
runtime ApplyLocale collapses to one loop:
for (const auto& slot : kSlots)
{
*slot.dest = slot.values[localeIndex];
}
Master changes:
- Add I18N::LocaleIndex(const char*) -> int; returns -1 for unknown.
- SetLocale resolves the locale string once and dispatches the int
to each group's ApplyLocale(int).
- The per-group ApplyLocale signature flips from (const char*) to
(int); nothing outside the generated tree calls it.
Group changes:
- WriteSlotTable emits the {dest, values[N]} rows for every entry.
Missing locales for this group reuse the default-locale literal
pointer in that column, so runtime never has to handle an empty
slot.
- WriteApplyLocale shrinks from N*M assignments to a single loop.
- Locale ordering is shared with the master via CppEmitter
.ComputeMasterLocales, so per-group column index matches the
int returned by I18N::LocaleIndex.
Net effect: Game.cpp drops from 70,405 lines / 6.4 MB to 41,696
lines / 5.5 MB. The runtime accessor API
(I18N::Game::InventoryIV etc.) is unchanged; call sites need no edits.
|
Pushed Each group now emits a single void ApplyLocale(int localeIndex) noexcept
{
for (const auto& slot : kSlots)
{
*slot.dest = slot.values[localeIndex];
}
}The master gained Measured impact on
Public accessor API ( |
CCharInfoBalloon::SetInfo populates m_szGuild and m_szClass into fixed-size wchar_t buffers from I18N::Game::Lookup(...) and from gCharacterManager.GetCharacterClassText(...). The buffers are then rendered every frame, so switching language while a character is on-screen leaves the balloon stuck on the previous locale's text until the character itself is re-bound. Register a locale observer in the constructor that re-runs SetInfo() whenever a character is bound; unregister in the destructor. The balloon's guild and class lines now flip immediately on language switch alongside the rest of the UI.
ResxLoader now throws a clear error when two keys in the same group carry the same legacy_id; the message names both keys and explains why the constraint exists (Lookup uses std::lower_bound over the sorted table). This catches the problem at generation time so the build fails before anything is compiled. CppEmitter emits a constexpr static_assert next to the generated kLegacyTable that verifies the table's IDs are strictly increasing. Even if someone hand-edits the generated file out of sync with the resx sources, the compile fails with a clear message pointing at the group whose table is corrupted. Generated Format() now treats two adjacent open braces as a literal open brace and two adjacent close braces as a literal close brace (std::format / Python convention), so translations can contain literal brace characters next to placeholders without breaking placeholder parsing. DialogImporter: guard Directory.CreateDirectory against null. If the output path is a bare filename, Path.GetDirectoryName returns null and the unchecked call would throw ArgumentNullException; skip the call when the parent path is null or empty instead.
NPC dialog text is now served from I18N::Dialog (generated from src/Localization/Dialog.*.resx) and the language-agnostic branching table in GameLogic::Quests::Dialog. Nothing loads these .bmd files anymore, mirroring the earlier cleanup of Text_*.bmd in a58ac50.
Summary
Adds a single resx-driven translation system. Translations live in
src/Localization/<Group>.<locale>.resx; the build runsTools/ResxGento compile them into typed C++ accessors undernamespace I18N::<Group>. Call sites use those accessors directly. Replaces the legacy editori18n::Translator(JSON), the in-gameGlobalText.bmd, and the per-languageDialog_<lang>.bmdfiles. Adds a live language dropdown to the options window so the player can switch without restarting; cached widget text refreshes in place.Ships translations for
en, de, es, id, pl, pt, ru, tl, uk, zh-TW(full hand-migration for Editor / Metadata; machine-translation for Game and German Dialog; original-source salvage for Dialog en / pt / es).How to use the I18N API
Reading a translated string
Each entry in resx becomes an
extern const wchar_t*(orconst char*for narrow groups). Reading is a pointer dereference - no map lookup, no string compare per call.Cached widget labels (auto-refresh on locale switch)
Pass the address of the accessor (
&I18N::...) instead of the accessor itself. The widget caches the current value AND registers as a locale observer; the cache refreshes automatically when the locale changes.Passing a
std::wstring(the old overload) still works for dynamic strings (character names, computed numbers). It clears any prior slot binding so the observer won't clobber the literal.Switching locale
GameConfig.UILocaleis the single source of truth; the options window updates it, persistsconfig.ini, and callsSetLocale.Custom observers (for things that aren't widgets)
Use this when you cache text outside the widget system (
wcscpyinto a buffer, a custom-rendered label, etc.).Observers fire after the group pointers are updated, so reading
I18N::...inside the callback gives the new locale's value.NPC dialog catalogue
NPC dialog text is now a normal I18N group. The structural branching data (number of replies per entry, link / return per reply) is a separate language-agnostic table.
Adding a new string
<data name="...">to everysrc/Localization/<Group>.<locale>.resx(default localeenis the source of truth; other locales fall back to it if missing).I18N::<Group>::OpenInventory(the key slugs to a PascalCase identifier).m_Btn.ChangeText(&I18N::<Group>::OpenInventory);.Adding a new locale
<Group>.<locale>.resxnext to the others for each group you want covered.kLocalestable, and slots it into every group's per-entry value array.Function reference (
I18N)void SetLocale(const char*)const char* GetCurrentLocale()std::span<const char* const> GetAvailableLocales()int LocaleIndex(const char*)GetAvailableLocales();-1if unknown.const char* GetLanguageDisplayName(const char*)"de"->"Deutsch").std::string Format(const char* fmt, {args...}){0} {1} ...placeholders. Out-of-bounds indices stay verbatim.void RegisterLocaleObserver(cb, ctx)void(*)(void*) noexceptto run after everySetLocale.void UnregisterLocaleObserver(cb, ctx)I18N::<Group>::<Key><Key>(one extern per resx entry).I18N::<Group>::Lookup(int)const wchar_t*/const char*.I18N::<Group>::LookupSlot(int)Lookupbut returns the slot pointer (const T* const*) for the widget auto-refresh path.Widget overloads added
CNewUIButton::ChangeText(const wchar_t* const* nameSlot)CNewUIButton::ChangeToolTipText(const wchar_t* const* tooltipSlot, bool istoppos = false)CNewUIRadioButton::ChangeText(const wchar_t* const* nameSlot)CNewUICheckBox::ChangeText(const wchar_t* const* nameSlot)CNewUIRadioGroupButton::ChangeRadioText(std::list<const wchar_t* const*>& slotList)CNewUIMuHelper::InsertButton/InsertCheckBoxCNewUIHeroPositionInfo::SetButtonInfoCNewUIGuardWindow::InitButtonTest plan
cmake --buildsucceeds clean.[UI] Locale=en-> English UI everywhere; switch tode, restart -> German UI everywhere.[UI] Locale=...updated inconfig.ini.Follow-ups
wcscpy(buf, I18N::<Group>::Lookup(idx))sites still cache into local buffers; switching them toLookupSlot+ observer would let those refresh in place too. Not on a visible UI hot path; deferred.