Non-blocking suggestion: when a model sets weights_prepacked=0 (raw [E, N, K/pack] weights) and the session has session.disable_prepacking, PrePack() never runs, so packed_fc{1,2}_weights_ stay null and int_weights_consumed_by_prepack is false. The round-3 fix correctly avoids the null-pointer crash by falling through to the raw initializer pointers — but those raw bytes are not in CUTLASS layout, so the runner consumes them as-if-prepacked and produces silently wrong output with no diagnostic.
This is exactly the silent-failure mode this PR set out to remove for the offline path; the earlier thread fixed the crash, but the misleading-success outcome is newly reachable via the user-facing weights_prepacked=0 contract. A cheap defensive check would make it loud:
if (is_int && !weights_prepacked_ &&
(packed_fc1_weights_ == nullptr || packed_fc2_weights_ == nullptr)) {
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
"QMoE weights_prepacked=0 requires PrePack to run, but the int weight "
"buffers were not produced (is session.disable_prepacking set?). Provide "
"CUTLASS-prepacked weights with weights_prepacked=1, or enable prepacking.");
}
(The wfp4afp8 weight branch has the same fall-through, so this is partly a pre-existing QMoE pattern — flagging it here only because the new opt-in contract makes it user-reachable.)
Originally posted by @tianleiwu in #28749 (comment)
Non-blocking suggestion: when a model sets
weights_prepacked=0(raw[E, N, K/pack]weights) and the session hassession.disable_prepacking,PrePack()never runs, sopacked_fc{1,2}_weights_stay null andint_weights_consumed_by_prepackis false. The round-3 fix correctly avoids the null-pointer crash by falling through to the raw initializer pointers — but those raw bytes are not in CUTLASS layout, so the runner consumes them as-if-prepacked and produces silently wrong output with no diagnostic.This is exactly the silent-failure mode this PR set out to remove for the offline path; the earlier thread fixed the crash, but the misleading-success outcome is newly reachable via the user-facing
weights_prepacked=0contract. A cheap defensive check would make it loud:(The wfp4afp8 weight branch has the same fall-through, so this is partly a pre-existing QMoE pattern — flagging it here only because the new opt-in contract makes it user-reachable.)
Originally posted by @tianleiwu in #28749 (comment)