From 55e38c069336aa7ecffaa6c42bd17f0a75cba0b2 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 14 May 2026 17:44:36 -0500 Subject: [PATCH 1/2] Clarify and regularize text format index parsing rules Resolves #648 --- design/mvp/Binary.md | 6 +- design/mvp/Explainer.md | 377 +++++++++--------- .../SharedEverythingDynamicLinking.md | 10 +- test/syntax/indices.wast | 217 ++++++++++ 4 files changed, 414 insertions(+), 196 deletions(-) create mode 100644 test/syntax/indices.wast diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 982930a5..00db5594 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -321,8 +321,8 @@ canon ::= 0x00 0x00 f: opts: ft: => (canon lift | 0x1d opts: => (canon error-context.debug-message opts (core func)) πŸ“ | 0x1e => (canon error-context.drop (core func)) πŸ“ | 0x1f => (canon waitable-set.new (core func)) πŸ”€ - | 0x20 cancel?: m: => (canon waitable-set.wait cancel? (memory m) (core func)) πŸ”€ - | 0x21 cancel?: m: => (canon waitable-set.poll cancel? (memory m) (core func)) πŸ”€ + | 0x20 cancel?: m: => (canon waitable-set.wait cancel? (memory m) (core func)) πŸ”€ + | 0x21 cancel?: m: => (canon waitable-set.poll cancel? (memory m) (core func)) πŸ”€ | 0x22 => (canon waitable-set.drop (core func)) πŸ”€ | 0x23 => (canon waitable.join (core func)) πŸ”€ | 0x26 => (canon thread.index (core func)) 🧡 @@ -344,7 +344,7 @@ opts ::= opt*:vec() => opt* canonopt ::= 0x00 => string-encoding=utf8 | 0x01 => string-encoding=utf16 | 0x02 => string-encoding=latin1+utf16 - | 0x03 m: => (memory m) + | 0x03 m: => (memory m) | 0x04 f: => (realloc f) | 0x05 f: => (post-return f) | 0x06 => async πŸ”€ diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index a79d089a..a2112621 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -7,8 +7,8 @@ more user-focused explanation, take a look at the * [Gated features](#gated-features) * [Grammar](#grammar) + * [Index spaces](#index-spaces) * [Component definitions](#component-definitions) - * [Index spaces](#index-spaces) * [Instance definitions](#instance-definitions) * [Alias definitions](#alias-definitions) * [Type definitions](#type-definitions) @@ -76,17 +76,80 @@ succinctness, with just enough detail to write examples and define a [binary format](Binary.md) in the style of the [Binary Format Section], deferring full precision to the [formal specification](../../spec/). -The main way the grammar hand-waves is regarding definition uses, where indices -referring to `X` definitions (written ``) should, in the real text -format, explicitly allow identifiers (``), checking at parse time that the -identifier resolves to an `X` definition and then embedding the resolved index -into the AST. - Additionally, standard [abbreviations] defined by the Core WebAssembly text format (e.g., inline export definitions) are assumed but not explicitly defined below. +### Index Spaces + +As with Core WebAssembly, the Component Model appends each definition to an +[index space], allowing earlier definitions to be referred to by later +definitions in the text and binary format via unsigned integer index. + +There are 5 component-level index spaces (components, component instances, +functions, types and values πŸͺ™), 6 core index spaces already in the Core +WebAssembly specification (functions, tables, memories, globals, tags and types) +and 2 additional core index spaces (modules and module instances) that are not in +the Core WebAssembly specification at the moment, but, with [module-linking], +could be in the future. These 13 index spaces correspond 1:1 with the terminals +of `sort` shown below and so "sort" and "index space" can be used +interchangeably. The grammar rule for `sort` prefixes the Core WebAssembly sorts +with `core` to disambiguate current and future overlaps. +```ebnf +core:sort ::= func + | table + | memory + | global + | tag + | type + | module + | instance +componentsort ::= component + | instance + | func + | type + | value πŸͺ™ +sort ::= componentsort + | core +``` + +As with Core WebAssembly, for each index space `X`, there is an `{X}idx` grammar +rule that parses indices as either a direct `u32` or a [`core:id`] that is +resolved to a `u32` according to the usual Core WebAssembly text format rules. +When the text format accepts indices in one of several different index spaces, +the sort is required to be written explicitly using the `externidx` rule, which +embeds an expanded version of [`core:externidx`] prefixed with `core`. +```ebnf +idx ::= | +core:{Xα΅’}sortidx ::= (Xα΅’ ) (for Xα΅’ ∈ ) +core:{Xα΅’}idx ::= | (for Xα΅’ ∈ ) +core:externidx ::= | ... | (for X₁,…,Xβ‚™ ∈ ) +{Xα΅’}sortidx ::= (Xα΅’ ) (for Xα΅’ ∈ ) +{Xα΅’}idx ::= | <{Xα΅’}sortidx> (for Xα΅’ ∈ ) +externidx ::= <{X₁}sortidx> | ... | <{Xβ‚™}sortidx> (for X₁,…,Xβ‚™ ∈ ) + | core-prefix() + +where core-prefix() is defined: + if parses (), then core-prefix() parses (core ) + if parses , then core-prefix() parses +``` +The -`sortidx` schemas generate rules like `funcsortidx` and `core:funcsortidx` +which both parse `(func $foo)` and `(func 0)`. The -`idx` schemas generate rules +like `funcidx`/`core:funcidx` that extend `funcsortidx`/`core:funcsortidx` to +also parse plain identifiers (like `$foo`) and `u32` indices (like `0`). + +The `core-prefix` meta-function is used here and below to transform a rule for +parsing a parenthesized definition without `core` into the same definition but +with a `core` token right after the leftmost parenthesis. For example, since +`` parses `(func $f)`, `core-prefix()` parses +`(core func $f)`. This allows `` rules (which don't include a `core` +token) to be used inside `(core ...)` expressions and `core-prefix()` +rules to be used outside `(core ...)` expressions (at component-level) to +explicitly mark where a `core` token might be required. The exception in +`core-prefix` for `` means that, for example, `core-prefix()` +accepts both `(core func $foo)` and plain `$foo`. + ### Component Definitions At the top-level, a `component` is a sequence of definitions of various kinds: @@ -104,21 +167,17 @@ definition ::= core-prefix() | | | πŸͺ™ - -where core-prefix(X) parses '(' 'core' Y ')' when X parses '(' Y ')' ``` Components are like Core WebAssembly modules in that their contained definitions are acyclic: definitions can only refer to preceding definitions (in the AST, text format and binary format). However, unlike modules, components can arbitrarily interleave different kinds of definitions. -The `core-prefix` meta-function transforms a grammatical rule for parsing a -Core WebAssembly definition into a grammatical rule for parsing the same -definition, but with a `core` token added right after the leftmost paren. -For example, `core:module` accepts `(module (func))` so -`core-prefix()` accepts `(core module (func))`. Note that the -inner `func` doesn't need a `core` prefix; the `core` token is used to mark the -*transition* from parsing component definitions into core definitions. +As defined in the previous section, `core-prefix()` adds a `core` token after +the leftmost parenthesis of `` and so, e.g., `core-prefix()` +parses `(core module ...)` instead of `(module ...)`. The `core` prefix +disambiguates current and future overlaps between core- and component-level +definitions (like `func`, `instance` and `type`). The [`core:module`] production is unmodified by the Component Model and thus components embed Core WebAssembly (text and binary format) modules as currently @@ -158,64 +217,6 @@ next), nothing will be instantiated or executed at runtime; everything here is dead code. -#### Index Spaces - -[Like Core WebAssembly][Core Indices], the Component Model places each -`definition` into one of a fixed set of *index spaces*, allowing the -definition to be referred to by subsequent definitions (in the text and binary -format) via a nonnegative integral *index*. When defining, validating and -executing a component, there are 5 component-level index spaces: -* (component) functions -* (component) values πŸͺ™ -* (component) types -* component instances -* components - -6 core index spaces that also exist in the Core WebAssembly specification: -* (core) functions -* (core) tables -* (core) memories -* (core) globals -* (core) tags -* (core) types - -and 2 additional core index spaces that contain core definition introduced by -the Component Model that are not in Core WebAssembly (yet: the [module-linking] -proposal would add them): -* module instances -* modules - -for a total of 13 index spaces that need to be maintained by an implementation -when, e.g., validating a component. These 13 index spaces correspond 1:1 with -the terminals of the `sort` production defined below and thus "sort" and -"index space" can be used interchangeably. - -Also [like Core WebAssembly][Core Identifiers], the Component Model text format -allows *identifiers* to be used in place of these indices, which are resolved -when parsing into indices in the AST (upon which validation and execution is -defined). Thus, the following two components are equivalent: -```wat -(component - (core module (; empty ;)) - (component (; empty ;)) - (core module (; empty ;)) - (export "C" (component 0)) - (export "M1" (core module 0)) - (export "M2" (core module 1)) -) -``` -```wat -(component - (core module $M1 (; empty ;)) - (component $C (; empty ;)) - (core module $M2 (; empty ;)) - (export "C" (component $C)) - (export "M1" (core module $M1)) - (export "M2" (core module $M2)) -) -``` - - ### Instance Definitions Whereas modules and components represent immutable *code*, instances associate @@ -229,21 +230,12 @@ multiple different styles of traditional [linking](Linking.md). The syntax for defining a core module instance is: ```ebnf -core:instance ::= (instance ? ) +core:instance ::= (instance ? ) core:instanceexpr ::= (instantiate *) | * -core:instantiatearg ::= (with (instance )) +core:instantiatearg ::= (with (instance )) | (with (instance *)) -core:sortidx ::= ( ) -core:sort ::= func - | table - | memory - | global - | tag - | type - | module - | instance -core:inlineexport ::= (export ) +core:inlineexport ::= (export ) ``` When instantiating a module via `instantiate`, the two-level imports of the core modules are resolved as follows: @@ -255,10 +247,6 @@ core modules are resolved as follows: exports of the core module instance found by the first step to select the imported core definition. -Each `core:sort` corresponds 1:1 with a distinct [index space] that contains -only core definitions of that *sort*. The `u32` field of `core:sortidx` -indexes into the sort's associated index space to select a definition. - Based on this, we can link two core modules `$A` and `$B` together with the following component: ```wat @@ -285,41 +273,26 @@ an example of these, we'll also need the `alias` definitions introduced in the next section. The syntax for defining component instances is symmetric to core module -instances, but with an expanded component-level definition of `sort`: +instances above: ```ebnf instance ::= (instance ? ) instanceexpr ::= (instantiate *) | * -instantiatearg ::= (with ) +instantiatearg ::= (with ) | (with (instance *)) name ::= -sortidx ::= ( ) -sort ::= core - | func - | value πŸͺ™ - | type - | component - | instance -inlineexport ::= (export "" ) - | (export "" ) πŸ”— -``` -Because component-level function, type and instance definitions are different -than core-level function, type and instance definitions, they are put into -disjoint index spaces which are indexed separately. Components may import -and export various core definitions (when they are compatible with the -[shared-nothing] model, which currently means only `module`, but may in the -future include `data`). Thus, component-level `sort` injects the full set -of `core:sort`, so that they may be referenced (leaving it up to validation -rules to throw out the core sorts that aren't allowed in various contexts). - -The `name` production reuses the `core:name` quoted-string-literal syntax of +inlineexport ::= (export "" ) + | (export "" ) πŸ”— +``` +Components may import and export various core definitions when they are +compatible with the [shared-nothing] model, which currently means only `module` +(since modules are pure immutable code) but may in the future include, e.g., +`data` (since data segments are pure immutale data). + +The `name` production reuses the [`core:name`] quoted-string-literal syntax of Core WebAssembly (which appears in core module imports and exports and can contain any valid UTF-8 string). -πŸͺ™ The `value` sort refers to a value that is provided and consumed during -instantiation. How this works is described in the -[value definitions](#value-definitions) section. - To see a non-trivial example of component instantiation, we'll first need to introduce a few other definitions below that allow components to import, define and export component functions. @@ -334,22 +307,65 @@ instance, the `core export` of a core module instance and a definition of an `outer` component (containing the current component): ```ebnf alias ::= (alias ( ?)) -aliastarget ::= export - | core export - | outer +aliastarget ::= export + | core export + | outer +``` +An `alias` definition adds a new index into the index space of ``, binding +the optional `?`, if present, to the new index. + +In the case of `export` aliases, validation requires `` to select an +instance in the instance index space and `name` must match an export in the type +of that instance. For example, the following export alias allows `$P` to extract +the `foo` function of its child component `$C` and re-export it directly from +`$P`: +```wat +(component $P + (component $C + ... + (export "foo" (func $foo-impl)) + ) + (instance $c (instantiate $C)) + (alias export $c "foo" (func $foo-alias)) + (export "foo" (func $foo-alias)) +) ``` -If present, the `id` of the alias is bound to the new index added by the alias -and can be used anywhere a normal `id` can be used. -In the case of `export` aliases, validation ensures `name` is an export in the -target instance and has a matching sort. +Additional syntactic sugar is added for allowing export aliases to be defined +*inline* as a syntactic generalization of the `{X}sortidx` grammar rules +defined [above](#index-spaces) for each core- and component-level sort `X`: +```ebnf +core:{Xα΅’}sortidx ::= ... | (Xα΅’ +) (for Xα΅’ ∈ ) +{Xα΅’}sortidx ::= ... | (Xα΅’ +) (for Xα΅’ ∈ ) +``` +Each `` projects an export of an instance: the first `` projects +from `` (which must select an instance) and each subsequent `` +projects from the (instance) result of the preceding alias. Because the +`{X}sortidx` rules are included in all the core- and component-level `{X}idx` +and `externidx` rules defined [above](#index-spaces), an inline export alias `(X +$id "name1" "name2" ...)` can be used anywhere that a normal `(X $id)` can be +used, and with the same `core`-prefixing rules. + +For example, the following example is equivalent to the previous example, +avoiding the need to explicitly define `$foo-alias`: +```wat +(component $P + (component $C + ... + (export "foo" (func $foo-impl)) + ) + (instance $c (instantiate $C)) + (export "foo" (func $c "foo")) +) +``` -In the case of `outer` aliases, the `u32` pair serves as a [de Bruijn -index], with first `u32` being the number of enclosing components/modules to -skip and the second `u32` being an index into the target's sort's index space. -In particular, the first `u32` can be `0`, in which case the outer alias refers -to the current component. To maintain the acyclicity of module instantiation, -outer aliases are only allowed to refer to *preceding* outer definitions. +In the case of `outer` aliases, the `` pair serves as a [de Bruijn index]. +If the first `` is a ``, it indicates the number of enclosing +components/modules to skip (with `0` referring to the current component/module). +Otherwise, if the first `` is an ``, it references the enclosing +component/module to skip *to*. The second `` indexes into the `` +index space of the target component/module, being able to only refer to +*preceding* definitions. Components containing outer aliases effectively produce a [closure] at instantiation time, including a copy of the outer-aliased definitions. Because @@ -359,37 +375,9 @@ modules and components. (In the future, outer aliases to all sorts of definitions could be allowed by recording the statefulness of the resulting component in its type via some kind of "`stateful`" type attribute.) -Both kinds of aliases come with syntactic sugar for implicitly declaring them -inline: - -For `export` aliases, the inline sugar extends the definition of `sortidx` -and the various sort-specific indices: -```ebnf -sortidx ::= ( ) ;; as above - | -Xidx ::= ;; as above - | -inlinealias ::= ( +) -``` -If `` refers to a ``, then the `` of `inlinealias` is a -``; otherwise it's an ``. For example, the -following snippet uses two inline function aliases: -```wat -(instance $j (instantiate $J (with "f" (func $i "f")))) -(export "x" (func $j "g" "h")) -``` -which are desugared into: -```wat -(alias export $i "f" (func $f_alias)) -(instance $j (instantiate $J (with "f" (func $f_alias)))) -(alias export $j "g" (instance $g_alias)) -(alias export $g_alias "h" (func $h_alias)) -(export "x" (func $h_alias)) -``` - -For `outer` aliases, the inline sugar is simply the identifier of the outer -definition, resolved using normal lexical scoping rules. For example, the -following component: +For `outer` aliases, there is also inline syntactic sugar, which is simply to +use the identifier of the outer definition, resolved using normal lexical +scoping rules. For example, the following component: ```wat (component (component $C ...) @@ -441,7 +429,7 @@ With what's defined so far, we're able to link modules with arbitrary renamings: )) (core instance $b3 (instantiate $B (with "a" (instance - (export "one" (func $a "three")) ;; renaming, using + (export "one" (func $a "three")) ;; renaming, using inline alias sugar )) )) ) @@ -466,7 +454,7 @@ core:moduledecl ::= | | core:alias ::= (alias ( ?)) -core:aliastarget ::= outer +core:aliastarget ::= outer core:importdecl ::= (import ) core:exportdecl ::= (export ) core:exportdesc ::= strip-id() @@ -572,8 +560,8 @@ defvaltype ::= bool valtype ::= | keytype ::= bool | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 | char | string πŸ—ΊοΈ -resourcetype ::= (resource (rep i32) (dtor )?) - | (resource (rep i64) (dtor )?) 🐘 +resourcetype ::= (resource (rep i32) (dtor core-prefix())?) + | (resource (rep i64) (dtor core-prefix())?) 🐘 functype ::= (func async? (param "