Skip to content

Commit 9dc41f7

Browse files
authored
Inline context.{get,set} in components (#13194)
* Inline `context.{get,set}` in components This commit reimplements the `context.{get,set}` intrinsics in the component model, introduced in the component-model-async and component-model-threading proposals. The intent of these intrinsics in WASIp3, for example, are intended to replace the `global`s used for the stack pointer and TLS base in previous modules, for example. The implementation of loading from a `global` is a single load instruction, whereas the previous implementation of `context.get` was a full libcall, which is significantly more expensive. The goal of this PR is to ensure that the transition to using `context.get` and `context.set` for high-performance uses retains the same performance as the WASIp2 constructs. Specifically the storage for `context.{get,set}` slots have been moved into the `VMStoreContext` structure which has a known layout to compiled code. There still remains storage within each `GuestThread` because there's only one store, and the idea is that whenever threads are switched between the switch operation is slightly more expensive now where it has to update and maintain the state in the store. The rationale for this is that it'll be far more often that these values are accessed rather than threads being swapped between. The implementation chosen in this commit is to model the `context.{get,set}` intrinsics as `UnsafeIntrinsic`s. This is a bit of a shoehorn where they're not actually unsafe, but all of the plumbing and support for `UnsafeIntrinsic` is effectively exactly what these want. To avoid duplicating lots of infrastructure that's where these now reside. The `concurrent.rs` implementation has been updated to save/restore context from the store, and this additionally updates a few other switch points to ensure that the store never switches away or to a deleted thread. This niche situation happened in a few scenarios with no impact from before, but with the switching implementation having to access threads it became load-bearing that these must be valid. The end result is that with `-Cinlining` the `context.{get,set}` instructions are two instructions instead of a full libcall. One instruction is loading `VMStoreContext`, which is GVN-able and hoist-able, while the other is the actual load/store. This is the same as the performance of the stack pointer being in an imported global, for example. * Fix build issues * Review comments
1 parent a0dd8b3 commit 9dc41f7

17 files changed

Lines changed: 280 additions & 174 deletions

File tree

crates/cranelift/src/compiler/component.rs

Lines changed: 78 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -724,28 +724,6 @@ impl<'a> TrampolineCompiler<'a> {
724724
|_, _| {},
725725
);
726726
}
727-
Trampoline::ContextGet { instance, slot } => {
728-
self.translate_libcall(
729-
host::context_get,
730-
TrapSentinel::NegativeOne,
731-
WasmArgs::InRegisters,
732-
|me, params| {
733-
params.push(me.index_value(*instance));
734-
params.push(me.builder.ins().iconst(ir::types::I32, i64::from(*slot)));
735-
},
736-
);
737-
}
738-
Trampoline::ContextSet { instance, slot } => {
739-
self.translate_libcall(
740-
host::context_set,
741-
TrapSentinel::Falsy,
742-
WasmArgs::InRegisters,
743-
|me, params| {
744-
params.push(me.index_value(*instance));
745-
params.push(me.builder.ins().iconst(ir::types::I32, i64::from(*slot)));
746-
},
747-
);
748-
}
749727
Trampoline::ThreadIndex => {
750728
self.translate_libcall(
751729
host::thread_index,
@@ -1464,9 +1442,7 @@ impl<'a> TrampolineCompiler<'a> {
14641442
Trampoline::ResourceRep { .. }
14651443
| Trampoline::ThreadIndex
14661444
| Trampoline::BackpressureInc { .. }
1467-
| Trampoline::BackpressureDec { .. }
1468-
| Trampoline::ContextGet { .. }
1469-
| Trampoline::ContextSet { .. } => return,
1445+
| Trampoline::BackpressureDec { .. } => return,
14701446

14711447
// Intrinsics used in adapters generated by FACT that aren't called
14721448
// directly from guest wasm, so no check is needed.
@@ -1552,6 +1528,74 @@ impl<'a> TrampolineCompiler<'a> {
15521528
&mut self.builder,
15531529
)
15541530
}
1531+
1532+
/// Loads `*mut VMStoreContext` and returns it.
1533+
///
1534+
/// Note that the `*mut VMStoreContext` value is the same for all
1535+
/// `VMContext`-like structures in a store. In this case it's loaded from
1536+
/// the *caller* vmctx rather than the *callee* vmctx. The caller is using a
1537+
/// `VMContext` for core wasm which is passed in a register, where the
1538+
/// callee is a `VMComponentContext` loaded from the `VMContext`. By using
1539+
/// the caller vmctx we're able to possibly eliminate the dead load of the
1540+
/// `VMComponentContext` if it's otherwise unused.
1541+
fn load_vm_store_context(&mut self) -> ir::Value {
1542+
let caller_vmctx = self.abi_load_params()[1];
1543+
self.builder.ins().load(
1544+
self.isa.pointer_type(),
1545+
ir::MemFlags::trusted()
1546+
.with_readonly()
1547+
.with_alias_region(Some(ir::AliasRegion::Vmctx))
1548+
.with_can_move(),
1549+
caller_vmctx,
1550+
i32::from(self.offsets.ptr.vmctx_store_context()),
1551+
)
1552+
}
1553+
1554+
fn translate_context_intrinsic(&mut self, intrinsic: UnsafeIntrinsic) {
1555+
// This is the width of the type being loaded from Wasmtime's
1556+
// `VMStoreContext` slot and it depends on the intrinsic.
1557+
let ty = match intrinsic {
1558+
UnsafeIntrinsic::ContextGetI32_0
1559+
| UnsafeIntrinsic::ContextSetI32_0
1560+
| UnsafeIntrinsic::ContextGetI32_1
1561+
| UnsafeIntrinsic::ContextSetI32_1 => ir::types::I32,
1562+
_ => unreachable!(),
1563+
};
1564+
1565+
let slot = match intrinsic {
1566+
UnsafeIntrinsic::ContextGetI32_0 | UnsafeIntrinsic::ContextSetI32_0 => 0,
1567+
UnsafeIntrinsic::ContextGetI32_1 | UnsafeIntrinsic::ContextSetI32_1 => 1,
1568+
_ => unreachable!(),
1569+
};
1570+
let offset = self
1571+
.offsets
1572+
.ptr
1573+
.vmstore_context_component_context_slot(slot);
1574+
let params = self.abi_load_params();
1575+
let vmstore_context = self.load_vm_store_context();
1576+
match intrinsic {
1577+
UnsafeIntrinsic::ContextGetI32_0 | UnsafeIntrinsic::ContextGetI32_1 => {
1578+
let context = self.builder.ins().load(
1579+
ty,
1580+
MemFlags::trusted(),
1581+
vmstore_context,
1582+
i32::from(offset),
1583+
);
1584+
self.abi_store_results(&[context]);
1585+
}
1586+
UnsafeIntrinsic::ContextSetI32_0 | UnsafeIntrinsic::ContextSetI32_1 => {
1587+
let new_context = params[2];
1588+
self.builder.ins().store(
1589+
MemFlags::trusted(),
1590+
new_context,
1591+
vmstore_context,
1592+
i32::from(offset),
1593+
);
1594+
self.abi_store_results(&[]);
1595+
}
1596+
_ => unreachable!(),
1597+
}
1598+
}
15551599
}
15561600

15571601
// Helper structure to implement `TranslateTrap`. This isn't possible to do
@@ -1633,12 +1677,7 @@ impl ComponentCompiler for Compiler {
16331677
vmctx,
16341678
wasmtime_environ::component::VMCOMPONENT_MAGIC,
16351679
);
1636-
let vm_store_context = c.builder.ins().load(
1637-
pointer_type,
1638-
MemFlags::trusted(),
1639-
vmctx,
1640-
i32::try_from(c.offsets.vm_store_context()).unwrap(),
1641-
);
1680+
let vm_store_context = c.load_vm_store_context();
16421681
super::save_last_wasm_exit_fp_and_pc(
16431682
&mut c.builder,
16441683
pointer_type,
@@ -1716,21 +1755,10 @@ impl ComponentCompiler for Compiler {
17161755
| UnsafeIntrinsic::U32NativeStore
17171756
| UnsafeIntrinsic::U64NativeStore => c.translate_store_intrinsic(intrinsic)?,
17181757
UnsafeIntrinsic::StoreDataAddress => {
1719-
let [callee_vmctx, _caller_vmctx] = *c.abi_load_params() else {
1720-
unreachable!()
1721-
};
17221758
let pointer_type = self.isa.pointer_type();
17231759

17241760
// Load the `*mut VMStoreContext` out of our vmctx.
1725-
let store_ctx = c.builder.ins().load(
1726-
pointer_type,
1727-
ir::MemFlags::trusted()
1728-
.with_readonly()
1729-
.with_alias_region(Some(ir::AliasRegion::Vmctx))
1730-
.with_can_move(),
1731-
callee_vmctx,
1732-
i32::try_from(c.offsets.vm_store_context()).unwrap(),
1733-
);
1761+
let store_ctx = c.load_vm_store_context();
17341762

17351763
// Load the `*mut T` out of the `VMStoreContext`.
17361764
let data_address = c.builder.ins().load(
@@ -1752,6 +1780,13 @@ impl ComponentCompiler for Compiler {
17521780

17531781
c.abi_store_results(&[data_address]);
17541782
}
1783+
1784+
UnsafeIntrinsic::ContextGetI32_0
1785+
| UnsafeIntrinsic::ContextGetI32_1
1786+
| UnsafeIntrinsic::ContextSetI32_0
1787+
| UnsafeIntrinsic::ContextSetI32_1 => {
1788+
c.translate_context_intrinsic(intrinsic);
1789+
}
17551790
}
17561791

17571792
c.builder.finalize();

crates/environ/src/component.rs

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -186,10 +186,6 @@ macro_rules! foreach_builtin_component_function {
186186
#[cfg(feature = "component-model-async")]
187187
error_context_transfer(vmctx: vmctx, src_idx: u32, src_table: u32, dst_table: u32) -> u64;
188188
#[cfg(feature = "component-model-async")]
189-
context_get(vmctx: vmctx, caller_instance: u32, slot: u32) -> u64;
190-
#[cfg(feature = "component-model-async")]
191-
context_set(vmctx: vmctx, caller_instance: u32, slot: u32, val: u32) -> bool;
192-
#[cfg(feature = "component-model-async")]
193189
thread_index(vmctx: vmctx) -> u64;
194190
#[cfg(feature = "component-model-async")]
195191
thread_new_indirect(vmctx: vmctx, caller_instance: u32, func_ty_id: u32, func_table_idx: u32, func_idx: u32, context: u32) -> u64;

crates/environ/src/component/dfg.rs

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -476,14 +476,6 @@ pub enum Trampoline {
476476
Trap,
477477
EnterSyncCall,
478478
ExitSyncCall,
479-
ContextGet {
480-
instance: RuntimeComponentInstanceIndex,
481-
slot: u32,
482-
},
483-
ContextSet {
484-
instance: RuntimeComponentInstanceIndex,
485-
slot: u32,
486-
},
487479
ThreadIndex,
488480
ThreadNewIndirect {
489481
instance: RuntimeComponentInstanceIndex,
@@ -1167,14 +1159,6 @@ impl LinearizeDfg<'_> {
11671159
Trampoline::Trap => info::Trampoline::Trap,
11681160
Trampoline::EnterSyncCall => info::Trampoline::EnterSyncCall,
11691161
Trampoline::ExitSyncCall => info::Trampoline::ExitSyncCall,
1170-
Trampoline::ContextGet { instance, slot } => info::Trampoline::ContextGet {
1171-
instance: *instance,
1172-
slot: *slot,
1173-
},
1174-
Trampoline::ContextSet { instance, slot } => info::Trampoline::ContextSet {
1175-
instance: *instance,
1176-
slot: *slot,
1177-
},
11781162
Trampoline::ThreadIndex => info::Trampoline::ThreadIndex,
11791163
Trampoline::ThreadNewIndirect {
11801164
instance,

crates/environ/src/component/info.rs

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1118,28 +1118,6 @@ pub enum Trampoline {
11181118
/// pushed by `EnterSyncCall`.
11191119
ExitSyncCall,
11201120

1121-
/// Intrinsic used to implement the `context.get` component model builtin.
1122-
///
1123-
/// The payload here represents that this is accessing the Nth slot of local
1124-
/// storage.
1125-
ContextGet {
1126-
/// The specific component instance which is calling the intrinsic.
1127-
instance: RuntimeComponentInstanceIndex,
1128-
/// Which slot to access.
1129-
slot: u32,
1130-
},
1131-
1132-
/// Intrinsic used to implement the `context.set` component model builtin.
1133-
///
1134-
/// The payload here represents that this is accessing the Nth slot of local
1135-
/// storage.
1136-
ContextSet {
1137-
/// The specific component instance which is calling the intrinsic.
1138-
instance: RuntimeComponentInstanceIndex,
1139-
/// Which slot to update.
1140-
slot: u32,
1141-
},
1142-
11431121
/// Intrinsic used to implement the `thread.index` component model builtin.
11441122
ThreadIndex,
11451123

@@ -1256,8 +1234,6 @@ impl Trampoline {
12561234
Trap => format!("trap"),
12571235
EnterSyncCall => format!("enter-sync-call"),
12581236
ExitSyncCall => format!("exit-sync-call"),
1259-
ContextGet { .. } => format!("context-get"),
1260-
ContextSet { .. } => format!("context-set"),
12611237
ThreadIndex => format!("thread-index"),
12621238
ThreadNewIndirect { .. } => format!("thread-new-indirect"),
12631239
ThreadSuspendToSuspended { .. } => format!("thread-suspend-to-suspended"),

crates/environ/src/component/intrinsic.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,11 @@ macro_rules! for_each_unsafe_intrinsic {
2222

2323
"u64-native-load" => U64NativeLoad : u64_native_load(address: u64) -> u64;
2424
"u64-native-store" => U64NativeStore : u64_native_store(address: u64, value: u64);
25+
26+
"context-get-i32-0" => ContextGetI32_0 : context_get_i32_0() -> u32;
27+
"context-set-i32-0" => ContextSetI32_0 : context_set_i32_0(val: u32);
28+
"context-get-i32-1" => ContextGetI32_1 : context_get_i32_1() -> u32;
29+
"context-set-i32-1" => ContextSetI32_1 : context_set_i32_1(val: u32);
2530
}
2631
};
2732
}

crates/environ/src/component/translate/inline.rs

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1074,24 +1074,24 @@ impl<'a> Inliner<'a> {
10741074
frame.funcs.push((*func, dfg::CoreDef::Trampoline(index)));
10751075
}
10761076
ContextGet { func, i } => {
1077-
let index = self.result.trampolines.push((
1078-
*func,
1079-
dfg::Trampoline::ContextGet {
1080-
instance: frame.instance,
1081-
slot: *i,
1082-
},
1083-
));
1084-
frame.funcs.push((*func, dfg::CoreDef::Trampoline(index)));
1077+
let intrinsic = match i {
1078+
0 => UnsafeIntrinsic::ContextGetI32_0,
1079+
1 => UnsafeIntrinsic::ContextGetI32_1,
1080+
_ => unreachable!(),
1081+
};
1082+
frame
1083+
.funcs
1084+
.push((*func, dfg::CoreDef::UnsafeIntrinsic(*func, intrinsic)));
10851085
}
10861086
ContextSet { func, i } => {
1087-
let index = self.result.trampolines.push((
1088-
*func,
1089-
dfg::Trampoline::ContextSet {
1090-
instance: frame.instance,
1091-
slot: *i,
1092-
},
1093-
));
1094-
frame.funcs.push((*func, dfg::CoreDef::Trampoline(index)));
1087+
let intrinsic = match i {
1088+
0 => UnsafeIntrinsic::ContextSetI32_0,
1089+
1 => UnsafeIntrinsic::ContextSetI32_1,
1090+
_ => unreachable!(),
1091+
};
1092+
frame
1093+
.funcs
1094+
.push((*func, dfg::CoreDef::UnsafeIntrinsic(*func, intrinsic)));
10951095
}
10961096
ThreadIndex { func } => {
10971097
let index = self

crates/environ/src/vmoffsets.rs

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ use crate::{
3939
};
4040
use cranelift_entity::packed_option::ReservedValue;
4141

42+
/// Number of slots in for `component_context` in the `VMStoreContext`. This is
43+
/// defined by the component model's `context.{get,set}` intrinsics.
44+
pub const NUM_COMPONENT_CONTEXT_SLOTS: usize = 2;
45+
4246
#[cfg(target_pointer_width = "32")]
4347
fn cast_to_u32(sz: usize) -> u32 {
4448
u32::try_from(sz).unwrap()
@@ -250,6 +254,20 @@ pub trait PtrSize {
250254
self.vmstore_context_stack_chain() + self.size_of_vmstack_chain()
251255
}
252256

257+
/// Return the offset of the `async_guard_range` field of `VMStoreContext`.
258+
fn vmstore_context_async_guard_range(&self) -> u8 {
259+
self.vmstore_context_store_data() + self.size()
260+
}
261+
262+
/// Return the offset of the `component_context[i]` field of
263+
/// `VMStoreContext`.
264+
fn vmstore_context_component_context_slot(&self, i: u8) -> u8 {
265+
assert!(usize::from(i) < NUM_COMPONENT_CONTEXT_SLOTS);
266+
let base = self.vmstore_context_async_guard_range() + 2 * self.size();
267+
let slot_size = 4;
268+
base + i * slot_size
269+
}
270+
253271
// Offsets within `VMMemoryDefinition`
254272

255273
/// The offset of the `base` field.

0 commit comments

Comments
 (0)