Add CPU GDN kernel for Qwen3.5 with correctness test and benchmark#687
Add CPU GDN kernel for Qwen3.5 with correctness test and benchmark#687Misscheng-eng wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughAdds a new CPU GDN (Gated Delta Network) forward kernel ( ChangesCPU GDN Kernel, Op Wiring, and Tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
mllm/backends/cpu/ops/test_gdn_kernel.cpp (1)
47-58: ⚡ Quick winSmoke test never fails on wrong output.
Line 57 always prints success without validating
s_new, so this binary won’t catch regressions.Proposed fix
+#include <cmath> `#include` <iostream> `#include` <vector> -#include <cmath> @@ int main() { @@ gdn_forward(s_prev, k, v, gate, beta, s_new); - std::cout << "GDN kernel OK\n"; + // With zero s_prev and k/v initialized as {1,0,0,0} per head: + // s_new[h,0,0] should equal beta[h], and all other entries remain 0. + for (int h = 0; h < H; ++h) { + const int base = h * D * D; + if (std::fabs(s_new[base] - beta[h]) > 1e-6f) { + std::cerr << "Mismatch at head " << h << ": got " << s_new[base] + << ", expected " << beta[h] << "\n"; + return 1; + } + } + std::cout << "GDN kernel OK\n"; + return 0; }As per coding guidelines, “Suggest adding unit tests for untested complex logic or edge cases.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/cpu/ops/test_gdn_kernel.cpp` around lines 47 - 58, The main function in the test calls gdn_forward with test inputs but never validates the output array s_new, so the test always succeeds regardless of correctness. Add validation checks after the gdn_forward call that compare the computed values in s_new against expected results using assertions or similar checks. This will ensure regressions in the gdn_forward function are caught when the output differs from expected values.Source: Coding guidelines
mllm/backends/cpu/kernels/GDNKernel.hpp (2)
10-24: ⚡ Quick winDocument the raw-pointer layout contract.
The formula comment does not state required shapes/layouts, required non-null pointers,
batch_sizeconstraints, optionalq/outputsemantics, or failure behavior for this public raw-pointer API.As per coding guidelines, “Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/cpu/kernels/GDNKernel.hpp` around lines 10 - 24, The forward method in the GDNKernel struct lacks documentation of its raw-pointer API contract. Add a comprehensive docstring to the forward method that specifies the required shapes and memory layouts of all pointer parameters (s_prev, k, v, gate, beta, s_new, output, q), which pointers must be non-null and which are optional, the valid range for batch_size, the semantics of the optional output and q parameters, and any failure modes or error behavior. This will ensure users understand the memory requirements and constraints for calling this public API correctly.Source: Coding guidelines
5-6: ⚡ Quick winAvoid heap allocation inside the per-head kernel loop.
std::vector<T> proj(D, 0)allocates for every batch/head. SinceDis constexpr, use stack storage and avoid allocator overhead in this CPU hot path.Proposed fix
-#include <vector> +#include <array> `#include` <cmath>- std::vector<T> proj(D, 0); + std::array<T, D> proj;As per coding guidelines, “Avoid unnecessary object creation in loops or hot paths.”
Also applies to: 47-53
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/cpu/kernels/GDNKernel.hpp` around lines 5 - 6, The GDNKernel.hpp file allocates a std::vector in the per-head kernel loop (lines 47-53 and similar locations), which causes heap allocation overhead in a CPU-intensive hot path. Since D is constexpr, replace the std::vector<T> proj(D, 0) allocations with a stack-allocated std::array<T, D> instead. This eliminates the allocator overhead while maintaining the same functionality, as the size is known at compile time. Ensure all instances of this pattern throughout the kernel loop are updated to use the stack-allocated alternative.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@mllm/backends/cpu/kernels/GDNKernel.hpp`:
- Around line 20-23: Remove all trailing whitespace from the end of the lines in
the GDNKernel.hpp file. Specifically, trim any extra spaces at the end of the
parameter declaration lines in the function signature (around the s_prev, k, v,
gate, beta, s_new, output, and q parameters), and also check lines 36-38 and
line 69 for the same issue. Ensure each line ends cleanly without any trailing
spaces, as per the coding guidelines.
- Around line 18-24: The forward method in the GDNKernel template class has a
template parameter mismatch where the output parameter is declared as float* and
the q parameter is declared as const float*, but they should be T* and const T*
respectively to match the template parameter T used throughout the function.
Change the output parameter type from float* to T* and the q parameter type from
const float* to const T* in the forward method signature to ensure type
consistency and avoid undefined behavior when the template is instantiated with
types other than float.
In `@mllm/backends/cpu/ops/GDNOp.hpp`:
- Line 24: Line 24 contains trailing whitespace on a blank line, which violates
coding guidelines that prohibit lines from ending with whitespace. Navigate to
line 24 in the GDNOp.hpp file and remove all trailing whitespace (spaces or
tabs) from that blank line, leaving it completely empty.
- Around line 17-35: The forward() method in the GDNOp class directly accesses
raw float pointers from input and output tensors without validating their
correctness. Before calling GDNKernel::forward(), add validation checks to
ensure: the inputs vector contains exactly 5 tensors and outputs vector contains
exactly 1 tensor, all tensors have float dtype, all tensors are contiguous or
layout-compatible, the batch size B extracted from s_prev matches across all
input/output tensors, and each tensor has the expected shape (specifically
[B,16,128,*] or similar dimensions as per the kernel requirements). Perform
these checks after extracting B but before accessing any raw pointers via
.ptr<float>() calls.
In `@tests/cpu/test_gdn_correctness.py`:
- Line 9: The KERNEL_HEADER_PATH variable on line 9 uses a hardcoded absolute
path that is machine-specific and breaks on CI and other developer machines.
Replace the hardcoded path with a repo-relative path discovery approach by using
the test file's location (__file__) to dynamically construct the path to the
GDNKernel.hpp header. Use path manipulation functions like os.path.dirname,
os.path.join, or pathlib.Path to navigate from the current test file to the
kernel header location relative to the repository root. Apply the same fix to
line 17 which also has a hardcoded absolute path.
- Line 105: The print statement on line 105 uses an f-string prefix but contains
no placeholder variables, which triggers the Ruff F541 lint error. Remove the f
prefix from the string in the print function call since the string is a literal
string with no variables that need interpolation.
---
Nitpick comments:
In `@mllm/backends/cpu/kernels/GDNKernel.hpp`:
- Around line 10-24: The forward method in the GDNKernel struct lacks
documentation of its raw-pointer API contract. Add a comprehensive docstring to
the forward method that specifies the required shapes and memory layouts of all
pointer parameters (s_prev, k, v, gate, beta, s_new, output, q), which pointers
must be non-null and which are optional, the valid range for batch_size, the
semantics of the optional output and q parameters, and any failure modes or
error behavior. This will ensure users understand the memory requirements and
constraints for calling this public API correctly.
- Around line 5-6: The GDNKernel.hpp file allocates a std::vector in the
per-head kernel loop (lines 47-53 and similar locations), which causes heap
allocation overhead in a CPU-intensive hot path. Since D is constexpr, replace
the std::vector<T> proj(D, 0) allocations with a stack-allocated std::array<T,
D> instead. This eliminates the allocator overhead while maintaining the same
functionality, as the size is known at compile time. Ensure all instances of
this pattern throughout the kernel loop are updated to use the stack-allocated
alternative.
In `@mllm/backends/cpu/ops/test_gdn_kernel.cpp`:
- Around line 47-58: The main function in the test calls gdn_forward with test
inputs but never validates the output array s_new, so the test always succeeds
regardless of correctness. Add validation checks after the gdn_forward call that
compare the computed values in s_new against expected results using assertions
or similar checks. This will ensure regressions in the gdn_forward function are
caught when the output differs from expected values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aafc0fc2-4319-4c0e-9e1c-9f3b77f55371
📒 Files selected for processing (5)
mllm/backends/cpu/kernels/GDNKernel.hppmllm/backends/cpu/ops/GDNOp.cppmllm/backends/cpu/ops/GDNOp.hppmllm/backends/cpu/ops/test_gdn_kernel.cpptests/cpu/test_gdn_correctness.py
| template <typename T> | ||
| static void forward( | ||
| const T* s_prev, const T* k, const T* v, | ||
| const T* gate, const T* beta, | ||
| T* s_new, | ||
| float* output = nullptr, const float* q = nullptr, // 👈 改成 float* | ||
| int batch_size = 1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: List GDNKernel::forward call sites and explicit template arguments.
# Expectation: Existing call sites should all use float if this kernel is intended to remain float-only.
rg -nP -C3 'GDNKernel::forward\s*(<[^>]+>)?\s*\('Repository: UbiquitousLearning/mllm
Length of output: 1197
🏁 Script executed:
cat -n mllm/backends/cpu/kernels/GDNKernel.hpp | sed -n '15,80p'Repository: UbiquitousLearning/mllm
Length of output: 3141
🏁 Script executed:
rg -n 'GDNKernel' --type hpp --type cpp --type py | grep -E '(forward|::)' | head -20Repository: UbiquitousLearning/mllm
Length of output: 95
🏁 Script executed:
rg -n 'GDNKernel' --type-add 'hpp:*.{hpp,h}' --type hpp --type cpp --type pyRepository: UbiquitousLearning/mllm
Length of output: 637
Fix template parameter mismatch in GDNKernel::forward.
The forward<T> function signature declares output and q as float* and const float* (line 23), but uses them as T* and const T* at lines 72–73. This creates undefined behavior if instantiated with T other than float. All current call sites (test_gdn_correctness.py:22 and GDNOp.hpp:30) use float exclusively.
Choose one approach:
- Remove the template and declare the kernel as
float-only, or - Change
outputandqparameters toT*andconst T*to match the template contract.
Option 2 fix
- float* output = nullptr, const float* q = nullptr,
+ T* output = nullptr, const T* q = nullptr,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/cpu/kernels/GDNKernel.hpp` around lines 18 - 24, The forward
method in the GDNKernel template class has a template parameter mismatch where
the output parameter is declared as float* and the q parameter is declared as
const float*, but they should be T* and const T* respectively to match the
template parameter T used throughout the function. Change the output parameter
type from float* to T* and the q parameter type from const float* to const T* in
the forward method signature to ensure type consistency and avoid undefined
behavior when the template is instantiated with types other than float.
| const T* s_prev, const T* k, const T* v, | ||
| const T* gate, const T* beta, | ||
| T* s_new, | ||
| float* output = nullptr, const float* q = nullptr, // 👈 改成 float* |
There was a problem hiding this comment.
Remove trailing whitespace from the new header.
These lines visibly end with extra spaces in the submitted content.
As per coding guidelines, “No line may end with trailing whitespace.”
Also applies to: 36-38, 69-69
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/cpu/kernels/GDNKernel.hpp` around lines 20 - 23, Remove all
trailing whitespace from the end of the lines in the GDNKernel.hpp file.
Specifically, trim any extra spaces at the end of the parameter declaration
lines in the function signature (around the s_prev, k, v, gate, beta, s_new,
output, and q parameters), and also check lines 36-38 and line 69 for the same
issue. Ensure each line ends cleanly without any trailing spaces, as per the
coding guidelines.
Source: Coding guidelines
| void forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) override { | ||
| // 获取输入输出 Tensor | ||
| const auto& s_prev = inputs[0]; | ||
| const auto& k = inputs[1]; | ||
| const auto& v = inputs[2]; | ||
| const auto& gate = inputs[3]; | ||
| const auto& beta = inputs[4]; | ||
|
|
||
| auto& s_new = outputs[0]; | ||
| int B = s_prev.size(0); // 获取 batch size | ||
|
|
||
| // 调用我们写好的 GDNKernel 纯计算逻辑 | ||
| // 注意:这里使用 .ptr<float>() 获取底层数据指针 | ||
| GDNKernel::forward( | ||
| s_prev.ptr<float>(), k.ptr<float>(), v.ptr<float>(), | ||
| gate.ptr<float>(), beta.ptr<float>(), | ||
| s_new.ptr<float>(), | ||
| nullptr, nullptr, // output 和 q 暂时不用,传空指针 | ||
| B |
There was a problem hiding this comment.
Validate tensor arity, shape, dtype, and layout before raw pointer access.
forward() indexes inputs[0..4]/outputs[0] and then calls a fixed-layout kernel that reads/writes [B,16,128,*] memory using raw float pointers. Add project-standard checks for 5 inputs, 1 output, float dtype, contiguous/layout-compatible tensors, matching B, and expected shapes before calling the kernel.
As per coding guidelines, “Validate inputs for public APIs and critical internal functions” and “Ensure functions that can fail return appropriate error codes or raise exceptions.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/cpu/ops/GDNOp.hpp` around lines 17 - 35, The forward() method
in the GDNOp class directly accesses raw float pointers from input and output
tensors without validating their correctness. Before calling
GDNKernel::forward(), add validation checks to ensure: the inputs vector
contains exactly 5 tensors and outputs vector contains exactly 1 tensor, all
tensors have float dtype, all tensors are contiguous or layout-compatible, the
batch size B extracted from s_prev matches across all input/output tensors, and
each tensor has the expected shape (specifically [B,16,128,*] or similar
dimensions as per the kernel requirements). Perform these checks after
extracting B but before accessing any raw pointers via .ptr<float>() calls.
Source: Coding guidelines
| const auto& v = inputs[2]; | ||
| const auto& gate = inputs[3]; | ||
| const auto& beta = inputs[4]; | ||
|
|
There was a problem hiding this comment.
Remove trailing whitespace on the blank line.
Line 24 visibly contains indentation-only trailing whitespace.
As per coding guidelines, “No line may end with trailing whitespace.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/cpu/ops/GDNOp.hpp` at line 24, Line 24 contains trailing
whitespace on a blank line, which violates coding guidelines that prohibit lines
from ending with whitespace. Navigate to line 24 in the GDNOp.hpp file and
remove all trailing whitespace (spaces or tabs) from that blank line, leaving it
completely empty.
Source: Coding guidelines
| # ========================================== | ||
| # ⚠️ 请务必替换成你机器上 GDNKernel.hpp 的真实绝对路径! | ||
| # ========================================== | ||
| KERNEL_HEADER_PATH = "/home/cjt/mllm/mllm-main/mllm/backends/cpu/kernels/GDNKernel.hpp" |
There was a problem hiding this comment.
Use repo-relative header discovery instead of a machine-local absolute path.
Line 9 hardcodes /home/..., which breaks on CI and every other developer machine.
Proposed fix
import torch
import time
from torch.utils.cpp_extension import load_inline
-import os
+from pathlib import Path
@@
-KERNEL_HEADER_PATH = "/home/cjt/mllm/mllm-main/mllm/backends/cpu/kernels/GDNKernel.hpp"
+KERNEL_HEADER_PATH = (
+ Path(__file__).resolve().parents[2]
+ / "mllm"
+ / "backends"
+ / "cpu"
+ / "kernels"
+ / "GDNKernel.hpp"
+)
+if not KERNEL_HEADER_PATH.exists():
+ raise FileNotFoundError(f"GDN header not found: {KERNEL_HEADER_PATH}")
@@
-#include "{KERNEL_HEADER_PATH}"
+#include "{KERNEL_HEADER_PATH.as_posix()}"Also applies to: 17-17
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/cpu/test_gdn_correctness.py` at line 9, The KERNEL_HEADER_PATH variable
on line 9 uses a hardcoded absolute path that is machine-specific and breaks on
CI and other developer machines. Replace the hardcoded path with a repo-relative
path discovery approach by using the test file's location (__file__) to
dynamically construct the path to the GDNKernel.hpp header. Use path
manipulation functions like os.path.dirname, os.path.join, or pathlib.Path to
navigate from the current test file to the kernel header location relative to
the repository root. Apply the same fix to line 17 which also has a hardcoded
absolute path.
| # 实际吞吐量取决于你的业务定义,这里以 Batch Size 作为每秒处理的 token 数参考 | ||
| tokens_per_second = (B * repeats) / (end_time - start_time) | ||
|
|
||
| print(f"\n📈 Benchmark 结果汇总:") |
There was a problem hiding this comment.
Remove redundant f prefix (Ruff F541).
This line is an f-string without placeholders and triggers lint error.
Proposed fix
- print(f"\n📈 Benchmark 结果汇总:")
+ print("\n📈 Benchmark 结果汇总:")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"\n📈 Benchmark 结果汇总:") | |
| print("\n📈 Benchmark 结果汇总:") |
🧰 Tools
🪛 Ruff (0.15.17)
[error] 105-105: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/cpu/test_gdn_correctness.py` at line 105, The print statement on line
105 uses an f-string prefix but contains no placeholder variables, which
triggers the Ruff F541 lint error. Remove the f prefix from the string in the
print function call since the string is a literal string with no variables that
need interpolation.
Source: Linters/SAST tools
Overview
This PR implements the CPU version of the Gated Delta Network (GDN) attention kernel used in Qwen3.5-0.8B.
The goal is to provide a minimal, high-performance CPU backend implementation for GDN, enabling correct forward inference and benchmarking within the mllm CPU execution framework.
Implementation
We implement a pure C++ CPU kernel:
The forward computation follows the Qwen3.5 GDN attention formulation:
S_t = α S_{t-1}
- αβ (S_{t-1} k) k^T
+ β v k^T
Where:
CPU Integration
We add a lightweight CPU operator:
No modification is made to the core CPU runtime.
Testing
Correctness Test
Benchmark Results (CPU)
Batch=1:
Batch=4:
Scope
This PR ONLY targets:
No changes to:
Summary
This implementation provides a correct and efficient CPU baseline for Qwen3.5 GDN attention inference and can serve as a reference backend for further optimization (SIMD / multithreading).