Introduce Le<T> and Ne<T> newtypes for little- and native-endian integers#13193
Introduce Le<T> and Ne<T> newtypes for little- and native-endian integers#13193fitzgen wants to merge 1 commit intobytecodealliance:mainfrom
Le<T> and Ne<T> newtypes for little- and native-endian integers#13193Conversation
cfallin
left a comment
There was a problem hiding this comment.
Thanks! The overall goal seems pretty reasonable here -- some thoughts below.
|
|
||
| impl<T> $name<T> { | ||
| #[inline] | ||
| const fn is_little() -> bool { |
There was a problem hiding this comment.
FWIW the is_little name was a bit confusing for me at first because is_little = false sounds (naively) like "big endian"; it's not, it's "native endian", which could be actually big or little depending on host.
Maybe is_explicit_little?
There was a problem hiding this comment.
(After writing this comment, see more general thoughts below on the confusion)
|
|
||
| use core::{fmt, num::NonZero}; | ||
|
|
||
| macro_rules! define_endian_wrapper_types { |
There was a problem hiding this comment.
Thanks for the newtype-correctness efforts here!
I'm finding the naming/conceptual definition a little confusing, to be honest, because (in my head) the integer types themselves have an abstract domain (0..2^width) and their byte representations have endianness. This aligns with e.g. the builtin uNN::to_le_bytes() -- the LE representation is a sequence of bytes, not a uNN.
So to that end, what does it mean when we have a Le(1234)? Does that mean the "actual value" is 1234 but the value is stored in memory as either 1234 (little-endian host) or bswap(1234) (big-endian host)? Or vice-versa, the value is stored in memory however the host does and we interpret it as LE ("1234-native-endian, interpreted as LE")? (EDIT: I guess these two views are actually equivalent in results, because on a LE host, Le and Ne both mean "identity", and on a BE host, Le means "byteswapped"; but that still leaves a confusing definitional question unanswered)
I think I might have an easier time if we rename things a bit -- or failing that (since these types are very pervasive), document better. Suggestions:
ExplicitLittleEndian(n)andHostOrdered(n)- store
[u8; N]in the newtypes and make these traits wrappers around{from,to}_{le,ne}_bytes - something else I can't think of right now
I kind of favor the second, assuming that LLVM can see through it and not pessimize -- Godbolt example to demonstrate that I think it should: link
There was a problem hiding this comment.
store
[u8; N]in the newtypes and make these traits wrappers around{from,to}_{le,ne}_bytes
This sounds fine to me
| #[inline] | ||
| fn needs_gc_before_next_growth(&self) -> bool { | ||
| false | ||
| } |
| // any growth failures here. | ||
| let _ = store.grow_gc_heap(limiter.as_mut(), bytes_needed).await; | ||
| let _ = store | ||
| .grow_gc_heap(limiter.as_mut(), bytes_needed, asyncness) |
There was a problem hiding this comment.
asyncness change unrelated to endianness?
|
|
||
| (call $gc) | ||
| (drop (array.new $arr (i32.const 0) (i32.const 5))) | ||
| ;; (drop (array.new $arr (i32.const 0) (i32.const 5))) |
There was a problem hiding this comment.
Lots of commented-out lines in testsuite lines here and below -- revert?
Subscribe to Label Actioncc @fitzgen DetailsThis issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
alexcrichton
left a comment
There was a problem hiding this comment.
While I don't doubt that we could do better internally abuot endianness and conversions I'm personally relatively skeptical of the approach taken here. I share @cfallin's concerns as well with code like let exnref = Le::from_ne(exnref); I find pretty confusing. Additionally I'm not sure I understand the ValRaw changes, for example, because it should be the case that by definition everything in ValRaw is little-endian and the accessors shouldn't need endianness annotated on them.
Was this able to help track down the issue you're running into? If so could you expand on that a bit? I'm curious if that can help calibrate my thinking about things here.
Yes, and this doesn't change any of that, it just adds
The goal was to formalize in the type system where we have guaranteed-little-endian-regardless-of-native-endian integers and where we have native-endian integers and make it a compilation error to use the wrong endian in the wrong place. This limits the scope of where endianness bugs can pop up: only in JIT code or in the initial construction of an This helped me find a couple bugs inside the copying collector branch. |
|
I will make |
|
Personally even a small adjustment on this patch is not something I'd like to see landed, so I'd like to game this out more first and better understand what's happening here. For example I don't think we should use
I think this is a reasonable goal, but I'm not sure why we would need these wrappers outside of type definitions to achieve this. I also think it's reasonable to have everywhere be native-endian-unless-documented-otherwise where today documentation is |
Passive element segments were registered as GC roots by converting a `*mut ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian inside `ValRaw`, regardless of the target architecture's endianness, this cast is only valid on little endian systems. This bug was not exposed until the introduction of the copying collector in bytecodealliance#13107 The fix that this commit makes is to add another kind of GC root for `ValRaw`, where we can get and set the GC root using `ValRaw` internally, to ensure that the endianness (and incidentally also the GC ref offsets within the `ValRaw`) are matched up correctly between the GC root's definition and the collector's use of it. This brings us to three kinds of GC roots: Wasm stack roots, `VMGcRef` roots, and `ValRaw` roots. FWIW, I initially tried to make `VMGcRef` also always store its data as little endian, but this was a larger, more-invasive change and with feedback like bytecodealliance#13193 (comment) suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is non-zero right now). I also investigated making `PassiveElementSegment` an `enum` or either funcrefs or externrefs, similar to what we do for `wasmtime::runtime::vm::Table`. This also led to an outsized amount of code churn and didn't feel like it was paying for itself. Ultimately, I abandoned these approaches, preferring the one taken in this commit instead.
) * Fix passive element segment GC roots' endianness Passive element segments were registered as GC roots by converting a `*mut ValRaw` to a `*mut VMGcRef`. Because we always store GC refs as little endian inside `ValRaw`, regardless of the target architecture's endianness, this cast is only valid on little endian systems. This bug was not exposed until the introduction of the copying collector in bytecodealliance#13107 The fix that this commit makes is to add another kind of GC root for `ValRaw`, where we can get and set the GC root using `ValRaw` internally, to ensure that the endianness (and incidentally also the GC ref offsets within the `ValRaw`) are matched up correctly between the GC root's definition and the collector's use of it. This brings us to three kinds of GC roots: Wasm stack roots, `VMGcRef` roots, and `ValRaw` roots. FWIW, I initially tried to make `VMGcRef` also always store its data as little endian, but this was a larger, more-invasive change and with feedback like bytecodealliance#13193 (comment) suggesting the use of `[u8; 4]` instead of `u32` to make the byte ordering explicit, we break `rustc`'s niche type optimizations (since `VMGcRef` is non-zero right now). I also investigated making `PassiveElementSegment` an `enum` or either funcrefs or externrefs, similar to what we do for `wasmtime::runtime::vm::Table`. This also led to an outsized amount of code churn and didn't feel like it was paying for itself. Ultimately, I abandoned these approaches, preferring the one taken in this commit instead. * fix compilation of no-gc builds * review feedback
Split out from #13107 while debugging s390x issues.