Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

BitBLAS's compiling seems conflicting with vLLM's torch.compile integration? #315

@xxxxyu

Description

@xxxxyu

Failed to run a GPTQ model https://huggingface.co/JunHowie/Qwen3-8B-GPTQ-Int4 with latest vLLM with quantization="gptq_bitblas".

Got:

(EngineCore_DP0 pid=506800) torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
(EngineCore_DP0 pid=506800)   Explanation: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
... (omitted)
(APIServer pid=506583) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Complete log (after tuning):
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:00<00:00,  1.38it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.58it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.54it/s]
(EngineCore_DP0 pid=506800) 
(EngineCore_DP0 pid=506800) INFO 10-14 21:31:21 [default_loader.py:314] Loading weights took 1.41 seconds
(EngineCore_DP0 pid=506800) INFO 10-14 21:31:50 [gpu_model_runner.py:2910] Model loading took 5.6824 GiB and 489.310448 seconds
(EngineCore_DP0 pid=506800) /home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py:1481: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
(EngineCore_DP0 pid=506800) If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
(EngineCore_DP0 pid=506800) If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
(EngineCore_DP0 pid=506800)   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] EngineCore failed to start.
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Traceback (most recent call last):
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 783, in run_engine_core
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 555, in __init__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     super().__init__(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 223, in _initialize_kv_caches
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/executor/abstract.py", line 88, in determine_available_memory
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/executor/uniproc_executor.py", line 74, in collective_rpc
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/utils/__init__.py", line 2977, in run_method
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_worker.py", line 280, in determine_available_memory
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     self.model_runner.profile_run()
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3722, in profile_run
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]                                         ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return func(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3455, in _dummy_run
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     outputs = self.model(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]               ^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 321, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     hidden_states = self.model(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]                     ^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/decorators.py", line 407, in __call__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in compile_wrapper
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     raise e.with_traceback(None) from e.__cause__  # User compiler error
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   Explanation: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] 
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   Developer debug context: module: <unknown module>, qualname: _SimpleCData.__new__, skip reason: <missing reason>
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] 
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] 
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] from user code:
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]    File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen2.py", line 385, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     hidden_states, residual = layer(positions, hidden_states, residual)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 225, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     hidden_states = self.self_attn(
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 144, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     qkv, _ = self.qkv_proj(hidden_states)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/linear.py", line 582, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     output_parallel = self.quant_method.apply(self, input_, bias)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/gptq_bitblas.py", line 479, in apply
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     out = self.kernel.apply_gptq_bitblas_linear(layer, x)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py", line 315, in apply_gptq_bitblas_linear
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     output = self.bitblas_matmul(*args)  # type: ignore[operator]
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 756, in __call__
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     return self.forward(*args, **kwds)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 751, in forward
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     self._forward_from_prebuild_lib(*args, stream=stream.cuda_stream)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/operator.py", line 459, in _forward_from_prebuild_lib
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     ctypes.c_void_p(arr.data_ptr()) if not isinstance(arr, int) else arr for arr in args
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/polyfills/__init__.py", line 204, in instantiate_user_defined_class_object
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792]     obj = cls.__new__(cls, *args, **kwargs)
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] 
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore_DP0 pid=506800) ERROR 10-14 21:31:51 [core.py:792] 
(EngineCore_DP0 pid=506800) Process EngineCore_DP0:
(EngineCore_DP0 pid=506800) Traceback (most recent call last):
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=506800)     self.run()
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=506800)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 796, in run_engine_core
(EngineCore_DP0 pid=506800)     raise e
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 783, in run_engine_core
(EngineCore_DP0 pid=506800)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=506800)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 555, in __init__
(EngineCore_DP0 pid=506800)     super().__init__(
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore_DP0 pid=506800)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=506800)                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core.py", line 223, in _initialize_kv_caches
(EngineCore_DP0 pid=506800)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=506800)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/executor/abstract.py", line 88, in determine_available_memory
(EngineCore_DP0 pid=506800)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/executor/uniproc_executor.py", line 74, in collective_rpc
(EngineCore_DP0 pid=506800)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=506800)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/utils/__init__.py", line 2977, in run_method
(EngineCore_DP0 pid=506800)     return func(*args, **kwargs)
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800)     return func(*args, **kwargs)
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_worker.py", line 280, in determine_available_memory
(EngineCore_DP0 pid=506800)     self.model_runner.profile_run()
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3722, in profile_run
(EngineCore_DP0 pid=506800)     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=506800)                                         ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=506800)     return func(*args, **kwargs)
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/worker/gpu_model_runner.py", line 3455, in _dummy_run
(EngineCore_DP0 pid=506800)     outputs = self.model(
(EngineCore_DP0 pid=506800)               ^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=506800)     return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=506800)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=506800)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=506800)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 321, in forward
(EngineCore_DP0 pid=506800)     hidden_states = self.model(
(EngineCore_DP0 pid=506800)                     ^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/compilation/decorators.py", line 407, in __call__
(EngineCore_DP0 pid=506800)     output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=506800)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in compile_wrapper
(EngineCore_DP0 pid=506800)     raise e.with_traceback(None) from e.__cause__  # User compiler error
(EngineCore_DP0 pid=506800)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=506800) torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
(EngineCore_DP0 pid=506800)   Explanation: Dynamo does not know how to trace the builtin `<unknown module>._SimpleCData.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
(EngineCore_DP0 pid=506800)   Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
(EngineCore_DP0 pid=506800)   Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
(EngineCore_DP0 pid=506800) 
(EngineCore_DP0 pid=506800)   Developer debug context: module: <unknown module>, qualname: _SimpleCData.__new__, skip reason: <missing reason>
(EngineCore_DP0 pid=506800) 
(EngineCore_DP0 pid=506800) 
(EngineCore_DP0 pid=506800) from user code:
(EngineCore_DP0 pid=506800)    File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen2.py", line 385, in forward
(EngineCore_DP0 pid=506800)     hidden_states, residual = layer(positions, hidden_states, residual)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 225, in forward
(EngineCore_DP0 pid=506800)     hidden_states = self.self_attn(
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/models/qwen3.py", line 144, in forward
(EngineCore_DP0 pid=506800)     qkv, _ = self.qkv_proj(hidden_states)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/linear.py", line 582, in forward
(EngineCore_DP0 pid=506800)     output_parallel = self.quant_method.apply(self, input_, bias)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/gptq_bitblas.py", line 479, in apply
(EngineCore_DP0 pid=506800)     out = self.kernel.apply_gptq_bitblas_linear(layer, x)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py", line 315, in apply_gptq_bitblas_linear
(EngineCore_DP0 pid=506800)     output = self.bitblas_matmul(*args)  # type: ignore[operator]
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 756, in __call__
(EngineCore_DP0 pid=506800)     return self.forward(*args, **kwds)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/general_matmul/__init__.py", line 751, in forward
(EngineCore_DP0 pid=506800)     self._forward_from_prebuild_lib(*args, stream=stream.cuda_stream)
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/bitblas/ops/operator.py", line 459, in _forward_from_prebuild_lib
(EngineCore_DP0 pid=506800)     ctypes.c_void_p(arr.data_ptr()) if not isinstance(arr, int) else arr for arr in args
(EngineCore_DP0 pid=506800)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/polyfills/__init__.py", line 204, in instantiate_user_defined_class_object
(EngineCore_DP0 pid=506800)     obj = cls.__new__(cls, *args, **kwargs)
(EngineCore_DP0 pid=506800) 
(EngineCore_DP0 pid=506800) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore_DP0 pid=506800) 
[rank0]:[W1014 21:31:51.724849712 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=506583) Traceback (most recent call last):
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=506583)     sys.exit(main())
(APIServer pid=506583)              ^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=506583)     args.dispatch_function(args)
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/cli/serve.py", line 62, in cmd
(APIServer pid=506583)     uvloop.run(run_server(args))
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=506583)     return __asyncio.run(
(APIServer pid=506583)            ^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=506583)     return runner.run(main)
(APIServer pid=506583)            ^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=506583)     return self._loop.run_until_complete(task)
(APIServer pid=506583)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=506583)     return await main
(APIServer pid=506583)            ^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 1917, in run_server
(APIServer pid=506583)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 1933, in run_server_worker
(APIServer pid=506583)     async with build_async_engine_client(
(APIServer pid=506583)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=506583)     return await anext(self.gen)
(APIServer pid=506583)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 191, in build_async_engine_client
(APIServer pid=506583)     async with build_async_engine_client_from_engine_args(
(APIServer pid=506583)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=506583)     return await anext(self.gen)
(APIServer pid=506583)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/entrypoints/openai/api_server.py", line 238, in build_async_engine_client_from_engine_args
(APIServer pid=506583)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=506583)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/utils/__init__.py", line 1336, in inner
(APIServer pid=506583)     return fn(*args, **kwargs)
(APIServer pid=506583)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/async_llm.py", line 208, in from_vllm_config
(APIServer pid=506583)     return cls(
(APIServer pid=506583)            ^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/async_llm.py", line 130, in __init__
(APIServer pid=506583)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=506583)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=506583)     return AsyncMPClient(*client_args)
(APIServer pid=506583)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core_client.py", line 807, in __init__
(APIServer pid=506583)     super().__init__(
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/core_client.py", line 468, in __init__
(APIServer pid=506583)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=506583)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=506583)   File "/home/lixiangyu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=506583)     next(self.gen)
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/utils.py", line 816, in launch_core_engines
(APIServer pid=506583)     wait_for_engine_startup(
(APIServer pid=506583)   File "/home/lixiangyu/repos/xxxxyu/vllm/vllm/v1/engine/utils.py", line 873, in wait_for_engine_startup
(APIServer pid=506583)     raise RuntimeError(
(APIServer pid=506583) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

When setting enforce-eager=True, there is no such error — it seems BitBLAS's compiling conflicts with vLLM's torch.compile integration?

If so, what is the best practice for running vLLM + BitBLAS?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions