Skip to content

[Rust] Improve Object API codegen: idiomatic types, Eq/Hash derives, String shadowing fix#8931

Open
pittengermdp wants to merge 13 commits into
google:masterfrom
pittengermdp:rust-object-api-improvements
Open

[Rust] Improve Object API codegen: idiomatic types, Eq/Hash derives, String shadowing fix#8931
pittengermdp wants to merge 13 commits into
google:masterfrom
pittengermdp:rust-object-api-improvements

Conversation

@pittengermdp

Copy link
Copy Markdown

Summary

  • Use idiomatic short type names (String, Vec, Box) instead of fully-qualified alloc:: paths in generated Rust Object API code, with use alloc::* imports added for no_std compatibility
  • Derive Eq and Hash on Object API types (tables, structs, unions, bitflags) when no floating-point fields are present in the type hierarchy
  • Fix String type shadowing: when a schema defines a table named String, generated code now emits std::string::String for stdlib string fields (wires up existing unused NamespaceShadowsString/ObjectStringType helpers)
  • Add #[allow(unused_imports)] to suppress warnings on generated imports
  • Fix serde Deserialize impl for enums to use fully-qualified std::string::String

Test plan

  • C++ tests pass (flattests)
  • Rust tests pass (usage, no_std, serde, reflection, outdir, clippy)
  • Go tests pass (TestAll, TestTextParsing, benchmarks)
  • Python tests pass (multi-file, one-file, flexbuffers)

@google-cla

google-cla Bot commented Feb 19, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions Bot added python c++ rust codegen Involving generating code from schema labels Feb 19, 2026
@csmulhern

Copy link
Copy Markdown
Contributor

The use of fully qualified names for Vec, String, etc, was done intentionally in #8564. The point is to deal with the situation you cover in bullet 3. If you were to name a struct / table Vec or String without using fully-qualified path names in the generated code when referencing the stdlib types, then your code generation would be broken. The use of fully-qualified paths for the stdlib types is the most robust and precise approach.

This can't be done for alloc because of how alloc works, but all current uses should be using alloc for no_std compatibility.

@jtdavis777 jtdavis777 added the waiting-for-update This PR is waiting for a change from the author or contributors before it is ready for merge label Mar 5, 2026
@pittengermdp pittengermdp force-pushed the rust-object-api-improvements branch from 819b32e to 65e4333 Compare March 18, 2026 05:26
…2.19

Rebuilds the fork cleanly on google/flatbuffers master (81edeb1) instead of the

broken 2016-rooted lineage. Replays all hand-written source (generators, runtimes,

schemas, hand-written tests) via 3-way merge; regenerable artifacts will be rebuilt

in a follow-up commit via scripts/generate_code.py.

Source features preserved: Rust Object API (Eq/Hash, OrderedFloat, serde, bare Box,

rustdoc/#[must_use]), --bfbs-gen-embed, flatc native map<K,V>/set<V> syntax + JSON/FBS

round-trip, Go/TS/Rust reflection + verifier runtimes, and map/set + cross-ns tests.

Conflicts resolved (4): idl_gen_rust.cpp (54 hunks, upstream reindent + fork features),

idl_gen_fbs.cpp (map/set re-emit + upstream string-escaping), builder.rs (upstream ? operator),

test.cpp (kept both upstream CrossNamespacePackTest and fork map/set tests).

Verified: flatc builds and links cleanly (v25.12.19).
…t Object-API

Regenerates all generated code (Rust/Go/TS/C++/samples/grpc) with the merged flatc

(v25.12.19), and completes the fork's OrderedFloat-in-Object-API feature so the full

Rust suite compiles and passes.

Generator fixes (src/idl_gen_rust.cpp):

- vector-of-float unpack: OrderedFloat(*v) -> OrderedFloat(v) (.iter() yields owned f64)

- optional scalar float: accessor .map(OrderedFloat); pack .map(|v| v.into_inner())

- struct array-of-float: array_init(|i| OrderedFloat(field.get(i))) (Array isn't indexable)

- struct array float Default: [OrderedFloat(0.0); N]

- AllFieldsHaveNaturalDefault: string fields with a schema default now force a manual

  Default impl (was silently emitting empty string)

Rust tests adapted to the OrderedFloat API (no assertions weakened): optional_scalars_test,

arrays_test, integration_test.

Verified: flatc builds; Rust suite green (276 integration + 21 arrays + 13 optional + samples);

Go runtime builds clean; TS runtime typechecks clean.

Note: js/mjs compiled output follows upstream (gitignored); re-add force-commit + .npmignore

if the frontend needs GitHub npm-install artifacts.
… behind flag (google#5)

google#3 (idl_gen_cpp.cpp): generated C++ static_assert now pins only MAJOR.MINOR, not

REVISION, so patch-version bumps stop churning every generated header.

google#5 (idl.h, flatc.cpp, idl_gen_rust.cpp): OrderedFloat-in-Object-API is now opt-in via

--rust-object-api-hashable-floats (default OFF). Default codegen emits plain f32/f64

(upstream-compatible ergonomics, no .into_inner() tax); the flag re-enables OrderedFloat

wrapping for Eq/Hash on float-bearing *T types. TypeIsHashable made transitive (nested

structs/unions/vectors) so Eq/Hash derives are sound in both modes.

Restored upstream's optional_scalars/arrays/integration Rust tests (plain-float API).

Verified: cargo test green (276+21+13+2+samples); default corpus has zero OrderedFloat;

flag ON re-emits OrderedFloat (12 hits on optional_scalars).
generate_code.py now runs each language's canonical formatter over the generated

corpus after codegen: clang-format (C++), rustfmt (Rust), gofmt (Go), prettier (TS),

and black (Python, if installed). Each is optional and skipped with a note when the

tool is absent, so the script still runs in minimal environments.

This makes regenerations idempotent and eliminates whitespace-only churn — the root

cause of the 54-conflict generator merge in this branch's history. Applied once here:

clang-format 27, rustfmt 90, gofmt 55, prettier 167 files.

Verified: Rust suite still green after reformat (formatters are semantics-preserving).
…google#1)

Upstream already runs a regen-diff gate (scripts/check_generate_code.py in the

'Check Generated Code' job), which reruns generate_code.py and fails on any diff.

Since generate_code.py now formats generated code (google#2), that gate is only

deterministic when the same formatters are present. Changes:

- build.yml 'Check Generated Code' job now sets up Go (gofmt), Rust (rustfmt), and

  Node + pnpm@10 + pnpm i (prettier) before the check, so regen output matches.

- Pinned prettier to 3.8.4 (package.json + pnpm-lock.yaml) for cross-env determinism.

- Dropped clang-format from the format step: its output varies sharply across major

  versions (local 22 vs CI 18), which would make the gate flaky. flatc already emits

  clean C++, so generated headers revert to flatc's deterministic output.

Verified: two consecutive regens produce byte-identical output (0 unstaged diff),

so the gate passes. gofmt/rustfmt confirmed stable (stable==nightly rustfmt output).
Scheduled GitHub Actions workflow (1st of each month + manual dispatch) that fetches

google/flatbuffers master, merges it into a dated upstream-sync/ branch, and opens a

PR (draft if conflicts) so CI runs and a human approves. Keeps the fork from ever

drifting to a stale merge-base again — the root cause of this branch's painful rebuild.
@pittengermdp pittengermdp force-pushed the rust-object-api-improvements branch from 41f3b73 to 8c84ab9 Compare June 25, 2026 22:22
Five verified correctness fixes across the generators and runtime:

- rust runtime: Verifier::reset() now clears apparent_size instead of
  assigning num_tables twice, so a reused Verifier no longer inherits the
  previous buffer's apparent size.
- rust serde: enums serialize unknown discriminants as their numeric value
  instead of variant_name().unwrap() (forward-compat); union serialize skips
  unknown variants instead of unimplemented!().
- go: nested struct fields are nil-guarded before Pack flattens them into
  CreateX(...), preventing a nil dereference on hand-built *T objects.
- ts: map/set pack sorts keys with a relational comparator, fixing bigint
  key precision loss above 2^53 that corrupted the sorted-vector order.
- rust: map/set float keys sort via total_cmp (total order including NaN).

Also repairs rust_serialize_test, which did not compile:
- serde dependency enables the `derive` feature (the generated code uses
  #[derive(serde::Serialize, serde::Deserialize)]).
- enum serialize uses `self.0 as u32` so signed/64-bit underlying types
  (byte/long) compile.
- bit-flags enums emit a serde Deserialize impl and serialize as u64, so
  *T structs containing bit-flags fields can derive Deserialize.
- builder test asserts the preallocated vecs don't reallocate (capacity
  stability) rather than zero global allocations, which is unsatisfiable
  now that strings_pool is a HashMap<String, _> with owned String keys.

Regenerated Go and Rust serde corpus accordingly.
…able-floats

- build-generator-windows now installs Go/Rust/Node+pnpm before generating, so
  format_generated_code() actually runs. Previously the Windows job regenerated
  unformatted output and diffed it against the formatted (Linux) committed
  corpus, making the regen-diff gate silently non-deterministic on Windows.
- Add compile + behaviour coverage for --rust-object-api-hashable-floats:
  generate_code.py now emits an OrderedFloat-wrapped optional_scalars variant,
  and optional_scalars_ordered_floats_test.rs proves the *T struct derives
  Eq/Hash and round-trips. The flag was previously exercised by no test, so it
  could silently break on an upstream merge.
- Fix a pre-existing no_std build regression: the fork's flatbuffers::HashMap /
  HashSet aliases used std::collections unconditionally under #![no_std]. Gate
  them behind the `std` feature (map/set Object API fields require std).
- generate_code.py: drop the dead `import glob` (shadowed by the local glob()
  helper) and the redundant inner `import shutil`.
64-bit (int64/uint64) field defaults and accessor fallbacks now use native
bigint literals (e.g. 0n, 42n) instead of BigInt('0') / BigInt(0) calls. The
values are identical at runtime; the literal form is a native expression rather
than a runtime constructor call and reads cleaner. Runtime BigInt(expr)
conversions (operand is a variable) are left unchanged. All consuming tsconfigs
target ES2020, which supports bigint literals (verified via tsc --noEmit).

Part of the non-breaking TypeScript improvements. Two other candidates were
intentionally not applied:
- readonly bb/bb_pos: not feasible — __init() assigns them outside the
  constructor, which TypeScript's readonly forbids; forcing it would require a
  breaking constructor-signature change.
- field-name/type-name clash (idl_gen_ts.cpp FIXME): latent only — no schema in
  the test corpus triggers it, so a fix would be unverifiable defensive code.

The breaking AnyT discriminated-union *T field retyping was deferred (per
review) to keep the generated Object API stable for downstream consumers.
…constants

- KeyCompare and LookupByKey now declare their temporary accessors as
  `var obj T` instead of `obj := &T{}`. No pointer escapes either function, so
  Go keeps them on the stack — removing two heap allocations per comparator
  call during a sort, and one per step of every O(log n) keyed binary search.
- Generate exported `<Type>Field<Field>` string constants (value = the schema
  field name) and point UnpackFields' doc at them. Callers now select fields
  with compile-checked constants (e.g. MonsterFieldHp) instead of guessing raw
  schema-name strings; a misspelled name fails to compile rather than silently
  populating nothing. Non-breaking: existing string callers still work because
  each constant's value is the schema name UnpackFields already matches on.

Regenerated the Go corpus; Go tests pass.
Run rustfmt over the fork's additions to the flatbuffers runtime crate so it is
`cargo fmt`-clean: the shared-string pool binary search in builder.rs and the
map/set HashMap/HashSet type aliases in lib.rs were authored with manual line
breaks that rustfmt collapses. No behavioural change. Keeps the first-party
library crates clean so a per-crate `cargo fmt --check` (pre-push) passes.
The 25.12.19 replay lost the root go.mod (module github.com/google/flatbuffers,
go 1.22). Without it Go compiles the runtime at go1.16, so the reflect_*.go files'
use of the predeclared 'any' identifier fails to build for downstream Go consumers.
Restore it at go 1.24 (also satisfies testing.Loop in verifier_bench_test.go).
Validated: go build ./go/ and go vet ./go/ both pass.
The upstream-25.12.19 replay dropped the committed js/mjs build output that the
fork ships for direct github-npm consumption (v0.3.0 shipped it; v0.4.0 did not),
so package.json main/module pointed at nonexistent files and liftNet_frontend's
'import flatbuffers' failed to resolve. Rebuild via npm run compile and force-add
js/ + mjs/ (both are .gitignored, matching the fork's prior practice).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c# c++ CI Continuous Integration codegen Involving generating code from schema dart documentation Documentation golang grpc java javascript json kotlin lobster lua nim php python rust swift typescript waiting-for-update This PR is waiting for a change from the author or contributors before it is ready for merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants