Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

Output wrong result when using w4a8 linear #313

@wh-ph

Description

@wh-ph

I am trying to use bitblas.Linear with A_dtype="int8" and W_dtype="int4".
I expected the output to be all 16.0 (since both inputs and weights are filled with ones), but I got unexpected values.

import torch
import bitblas

batch_size = 1
num_tokens = 2
in_features = 16
out_features = 32

model = bitblas.Linear(
    in_features=in_features,
    out_features=out_features,
    bias=False,
    A_dtype="int8",  # activation A dtype
    W_dtype="int4",  # weight W dtype
    accum_dtype="int32",  # accumulation dtype
    out_dtype="float32",  # output dtype
    # configs for weight only quantization
    group_size=None,  # setting for grouped quantization
    with_scaling=False,  # setting for scaling factor
    with_zeros=False,  # setting for zeros
    zeros_mode=None,  # setting for how to calculating zeros
    # Target optimization var for dynamic symbolic.
    # For detailed information please checkout docs/PythonAPI.md
    # By default, the optimization var is [1, 16, 32, 64, 128, 256, 512]
    opt_M=[1, 16, 32, 64, 128],
)

x = torch.ones((batch_size, num_tokens, in_features)).to(torch.int8)
w = torch.ones((out_features, in_features)).to(torch.int8)

x = x.cuda()
w = w.cuda()
model = model.cuda()

model.load_and_transform_weight(w)
model.eval()

with torch.no_grad():
    y = model(x)
print(y)

result:

tensor([[[  32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
            32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
            32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
            32., -104.],
         [  16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
            16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
            16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
            16., 16.]]], device='cuda:0')

environment:

torch: 2.3.0+cu121
cuda version: 12.1
GPU: NVIDIA GeForce RTX 4090

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions