Skip to content

QMoE: When a model sets weights_prepacked=0 runner can consume weights as-if-prepacked and produces silently wrong output with no diagnostic #28964

Description

@justinchuby

Non-blocking suggestion: when a model sets weights_prepacked=0 (raw [E, N, K/pack] weights) and the session has session.disable_prepacking, PrePack() never runs, so packed_fc{1,2}_weights_ stay null and int_weights_consumed_by_prepack is false. The round-3 fix correctly avoids the null-pointer crash by falling through to the raw initializer pointers — but those raw bytes are not in CUTLASS layout, so the runner consumes them as-if-prepacked and produces silently wrong output with no diagnostic.

This is exactly the silent-failure mode this PR set out to remove for the offline path; the earlier thread fixed the crash, but the misleading-success outcome is newly reachable via the user-facing weights_prepacked=0 contract. A cheap defensive check would make it loud:

if (is_int && !weights_prepacked_ &&
    (packed_fc1_weights_ == nullptr || packed_fc2_weights_ == nullptr)) {
  return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
      "QMoE weights_prepacked=0 requires PrePack to run, but the int weight "
      "buffers were not produced (is session.disable_prepacking set?). Provide "
      "CUTLASS-prepacked weights with weights_prepacked=1, or enable prepacking.");
}

(The wfp4afp8 weight branch has the same fall-through, so this is partly a pre-existing QMoE pattern — flagging it here only because the new opt-in contract makes it user-reachable.)

Originally posted by @tianleiwu in #28749 (comment)

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions