Skip to content

Introduce Le<T> and Ne<T> newtypes for little- and native-endian integers#13193

Open
fitzgen wants to merge 1 commit intobytecodealliance:mainfrom
fitzgen:endian-newtypes
Open

Introduce Le<T> and Ne<T> newtypes for little- and native-endian integers#13193
fitzgen wants to merge 1 commit intobytecodealliance:mainfrom
fitzgen:endian-newtypes

Conversation

@fitzgen
Copy link
Copy Markdown
Member

@fitzgen fitzgen commented Apr 24, 2026

Split out from #13107 while debugging s390x issues.

@fitzgen fitzgen requested review from a team as code owners April 24, 2026 19:54
@fitzgen fitzgen requested review from alexcrichton and removed request for a team April 24, 2026 19:54
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The overall goal seems pretty reasonable here -- some thoughts below.

Comment thread crates/core/src/endian.rs

impl<T> $name<T> {
#[inline]
const fn is_little() -> bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the is_little name was a bit confusing for me at first because is_little = false sounds (naively) like "big endian"; it's not, it's "native endian", which could be actually big or little depending on host.

Maybe is_explicit_little?

Copy link
Copy Markdown
Member

@cfallin cfallin Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(After writing this comment, see more general thoughts below on the confusion)

Comment thread crates/core/src/endian.rs

use core::{fmt, num::NonZero};

macro_rules! define_endian_wrapper_types {
Copy link
Copy Markdown
Member

@cfallin cfallin Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the newtype-correctness efforts here!

I'm finding the naming/conceptual definition a little confusing, to be honest, because (in my head) the integer types themselves have an abstract domain (0..2^width) and their byte representations have endianness. This aligns with e.g. the builtin uNN::to_le_bytes() -- the LE representation is a sequence of bytes, not a uNN.

So to that end, what does it mean when we have a Le(1234)? Does that mean the "actual value" is 1234 but the value is stored in memory as either 1234 (little-endian host) or bswap(1234) (big-endian host)? Or vice-versa, the value is stored in memory however the host does and we interpret it as LE ("1234-native-endian, interpreted as LE")? (EDIT: I guess these two views are actually equivalent in results, because on a LE host, Le and Ne both mean "identity", and on a BE host, Le means "byteswapped"; but that still leaves a confusing definitional question unanswered)

I think I might have an easier time if we rename things a bit -- or failing that (since these types are very pervasive), document better. Suggestions:

  • ExplicitLittleEndian(n) and HostOrdered(n)
  • store [u8; N] in the newtypes and make these traits wrappers around {from,to}_{le,ne}_bytes
  • something else I can't think of right now

I kind of favor the second, assuming that LLVM can see through it and not pessimize -- Godbolt example to demonstrate that I think it should: link

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store [u8; N] in the newtypes and make these traits wrappers around {from,to}_{le,ne}_bytes

This sounds fine to me

Comment thread crates/wasmtime/src/runtime/externals/global.rs
#[inline]
fn needs_gc_before_next_growth(&self) -> bool {
false
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to endianness?

// any growth failures here.
let _ = store.grow_gc_heap(limiter.as_mut(), bytes_needed).await;
let _ = store
.grow_gc_heap(limiter.as_mut(), bytes_needed, asyncness)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncness change unrelated to endianness?


(call $gc)
(drop (array.new $arr (i32.const 0) (i32.const 5)))
;; (drop (array.new $arr (i32.const 0) (i32.const 5)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of commented-out lines in testsuite lines here and below -- revert?

@github-actions github-actions Bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime labels Apr 24, 2026
@github-actions
Copy link
Copy Markdown

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't doubt that we could do better internally abuot endianness and conversions I'm personally relatively skeptical of the approach taken here. I share @cfallin's concerns as well with code like let exnref = Le::from_ne(exnref); I find pretty confusing. Additionally I'm not sure I understand the ValRaw changes, for example, because it should be the case that by definition everything in ValRaw is little-endian and the accessors shouldn't need endianness annotated on them.

Was this able to help track down the issue you're running into? If so could you expand on that a bit? I'm curious if that can help calibrate my thinking about things here.

@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Apr 27, 2026

Additionally I'm not sure I understand the ValRaw changes, for example, because it should be the case that by definition everything in ValRaw is little-endian and the accessors shouldn't need endianness annotated on them.

Yes, and this doesn't change any of that, it just adds pub(crate) fn foo_le(&self) -> Le<u32> getters to ValRaw to get the value wrapped in the Le newtype, formalizing the invariant that ValType things are stored as little-endian in the type system.

Was this able to help track down the issue you're running into? If so could you expand on that a bit? I'm curious if that can help calibrate my thinking about things here.

The goal was to formalize in the type system where we have guaranteed-little-endian-regardless-of-native-endian integers and where we have native-endian integers and make it a compilation error to use the wrong endian in the wrong place. This limits the scope of where endianness bugs can pop up: only in JIT code or in the initial construction of an Le/Ne (where you are effectively promising the given raw integer is in the claimed endianness).

This helped me find a couple bugs inside the copying collector branch.

@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented Apr 27, 2026

I will make ValRaw internally have [u8; 4] for GC refs as well.

@alexcrichton
Copy link
Copy Markdown
Member

alexcrichton commented Apr 27, 2026

Personally even a small adjustment on this patch is not something I'd like to see landed, so I'd like to game this out more first and better understand what's happening here. For example I don't think we should use [u8; N] throughout Wasmtime as that seems like it's highly likely to fall of LLVM's happy path and become pretty heavily deoptimized in some cases. I also don't personally see much value in Ne since the default, in theory, is "native endian everywhere unless otherwise marked". Personally I also don't see much value in pervasively using Le throughout the runtime as parameters/arguments/etc. I only think that this would make sense within type definitions to clarify "at rest this is little-endian unconditionally". As params/results/etc this seems like pure noise to me.

The goal was to formalize in the type system where we have guaranteed-little-endian-regardless-of-native-endian integers

I think this is a reasonable goal, but I'm not sure why we would need these wrappers outside of type definitions to achieve this. I also think it's reasonable to have everywhere be native-endian-unless-documented-otherwise where today documentation is // foo and tomorrow it could be Le<T> which would be nicer to have.

fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Apr 29, 2026
Passive element segments were registered as GC roots by converting a `*mut
ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian
inside `ValRaw`, regardless of the target architecture's endianness, this cast
is only valid on little endian systems. This bug was not exposed until the
introduction of the copying collector in
bytecodealliance#13107

The fix that this commit makes is to add another kind of GC root for `ValRaw`,
where we can get and set the GC root using `ValRaw` internally, to ensure that
the endianness (and incidentally also the GC ref offsets within the `ValRaw`)
are matched up correctly between the GC root's definition and the collector's
use of it. This brings us to three kinds of GC roots: Wasm stack roots,
`VMGcRef` roots, and `ValRaw` roots.

FWIW, I initially tried to make `VMGcRef` also always store its data as little
endian, but this was a larger, more-invasive change and with feedback like
bytecodealliance#13193 (comment)
suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering
explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is
non-zero right now). I also investigated making `PassiveElementSegment` an
`enum` or either funcrefs or externrefs, similar to what we do for
`wasmtime::runtime::vm::Table`. This also led to an outsized amount of code
churn and didn't feel like it was paying for itself. Ultimately, I abandoned
these approaches, preferring the one taken in this commit instead.
pull Bot pushed a commit to langyo/wasmtime that referenced this pull request Apr 29, 2026
)

* Fix passive element segment GC roots' endianness

Passive element segments were registered as GC roots by converting a `*mut
ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian
inside `ValRaw`, regardless of the target architecture's endianness, this cast
is only valid on little endian systems. This bug was not exposed until the
introduction of the copying collector in
bytecodealliance#13107

The fix that this commit makes is to add another kind of GC root for `ValRaw`,
where we can get and set the GC root using `ValRaw` internally, to ensure that
the endianness (and incidentally also the GC ref offsets within the `ValRaw`)
are matched up correctly between the GC root's definition and the collector's
use of it. This brings us to three kinds of GC roots: Wasm stack roots,
`VMGcRef` roots, and `ValRaw` roots.

FWIW, I initially tried to make `VMGcRef` also always store its data as little
endian, but this was a larger, more-invasive change and with feedback like
bytecodealliance#13193 (comment)
suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering
explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is
non-zero right now). I also investigated making `PassiveElementSegment` an
`enum` or either funcrefs or externrefs, similar to what we do for
`wasmtime::runtime::vm::Table`. This also led to an outsized amount of code
churn and didn't feel like it was paying for itself. Ultimately, I abandoned
these approaches, preferring the one taken in this commit instead.

* fix compilation of no-gc builds

* review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants