feat(simulator): support infinite-granularity updates (dw_min=0) by Zhaoxian-Wu · Pull Request #778 · IBM/aihwkit

Zhaoxian-Wu · 2026-06-20T19:59:21Z

Background

AIHWKit can simulate various of hardware non-idealities, like limited state (dw_min), response function, variation, or noise.
For diagnostic purposes it is often necessary to isolate these effects. Concretely, a user studying device behavior wants to answer "how much of this error/degradation comes from finite dw_min granularity, versus from the response function or from variation?"
Today there is no clean way to remove only the dw_min contribution: shrinking dw_min toward zero makes the pulse count diverge, which is both prohibitively slow and dominated by accumulated stochastic noise — so the dw_min → 0 limit is never actually reachable.

This PR makes that limit a first-class, exactly-computed mode. Setting dw_min = 0 decouples the impact of dw_min from the other update non-idealities (response function and variation), so a user can compare runs with finite dw_min against the ideal zero-granularity baseline while keeping every other device characteristic fixed.

What it does

Setting dw_min = 0 on any PulsedDevice activates infinite-granularity (IG) mode: instead of simulating a stochastic pulse train, the tile applies the exact mean-field limit of the update in a single deterministic step. The result is a noise-free update that still respects bounds and the device's weight-dependent response, but carries no dw_min granularity and no dw_min-related variation.

In IG mode the per-coincidence stochastic update is replaced by its expectation:

w ← w − lr · (dᵀx) · q(w)

where q(w) is the device-specific response function (the weight-dependent scale normally applied per pulse coincidence), and x / d are the forward and backward signals.

To keep q(w) intact while removing only the granularity:

cycle-to-cycle noise and dw_min-related device-to-device variation (dw_min_dtod, dw_min_std) are dropped;
all other variation — bounds, gamma, slopes, up/down asymmetry — is preserved;
populate() treats dw_min = 0 as unit response (effective_dw_min = 1) so the per-element scales encode q(w) directly;
because there is no longer a stochastic pulse train, the pulse-count / bit-length parameters of UpdateParameters — desired_bl, update_bl_management, update_management, and fixed_bl — are bypassed and have no effect in IG mode (the dispatch returns before the bit-line maker is ever invoked).

Usage

IG mode is opt-in through a single config field — set dw_min = 0 on any pulsed device; no other API change is needed. It works the same at the layer level and the tile level:

import torch
from aihwkit.nn import AnalogLinear
from aihwkit.optim import AnalogSGD
from aihwkit.simulator.configs import SingleRPUConfig
from aihwkit.simulator.configs.devices import ConstantStepDevice

# The only switch: dw_min = 0 activates infinite-granularity (IG) mode.
rpu_config = SingleRPUConfig(device=ConstantStepDevice(dw_min=0.0))

layer = AnalogLinear(4, 5, bias=False, rpu_config=rpu_config)
opt = AnalogSGD(layer.parameters(), lr=0.1)

opt.zero_grad()
loss = ((layer(torch.randn(3, 4)) - torch.randn(3, 5)) ** 2).sum()
loss.backward()
opt.step()   # deterministic, noise-free mean-field update

Diagnostic use case: isolating the `dw_min` contribution

Run the same device at two granularities while keeping the response function and
variation fixed. The difference is then exactly the effect of finite dw_min:

import torch
from aihwkit.simulator.configs import SingleRPUConfig
from aihwkit.simulator.configs.devices import ConstantStepDevice
from aihwkit.simulator.tiles.analog import AnalogTile

def run_update(device, w0, x, d, lr=0.1):
    tile = AnalogTile(5, 4, SingleRPUConfig(device=device), bias=False)
    tile.tile.set_learning_rate(lr)
    tile.tile.set_weights(w0.clone())
    tile.update(x, d)
    return tile.tile.get_weights()

torch.manual_seed(0)
w0 = torch.randn(5, 4) * 0.1
x, d = torch.randn(3, 4), torch.randn(3, 5)

w_ideal = run_update(ConstantStepDevice(dw_min=0.0),   w0, x, d)  # IG baseline
w_real  = run_update(ConstantStepDevice(dw_min=0.001), w0, x, d)  # finite dw_min

# The gap is purely the dw_min granularity contribution.
print("dw_min granularity error:", (w_real - w_ideal).abs().mean().item())

Note that dw_min = 0 removes only the granularity; variation is still applied. To reach the exact mean-field limit w -= lr * (dᵀx), disable the remaining non-idealities as well:

ideal = ConstantStepDevice(
    dw_min=0.0,                                          # remove granularity
    up_down_dtod=0.0, dw_min_dtod=0.0, dw_min_std=0.0,   # remove variation ...
    w_max_dtod=0.0, w_min_dtod=0.0,                      # ... and bound variation
)
# now a single update equals exactly  w -= lr * (d.T @ x)

Key changes

Dispatch (rpu_weight_updater.cpp) — when the weight granularity is ≤ 0, updateVectorWithDevice routes through the IG path (initUpdateCycle → doInfiniteGranularityUpdate → finishUpdateCycle) instead of the stochastic pulsed updater.
Base device (rpu_pulsed_device.{h,cpp}) — new virtual doInfiniteGranularityUpdate(...) with a default ConstantStep-style (weight-independent) implementation, plus the IG_UPDATE_W_LOOP_INNER helper macro. populate() switches to unit response and disables dw_min d-to-d variation when dw_min = 0.
Per-device overrides (CPU .cpp + CUDA .cu) — weight-dependent response for ConstantStep, LinearStep, ExpStep, PowStep, PiecewiseStep, SoftBoundsReference, and PowStepReference devices.
Config docs (configs/devices.py) — documents the dw_min = 0 IG behavior on PulsedDevice.dw_min.

Test coverage

New tests/test_infinite_granularity.py covering:

IG mode activates and runs for every supported device family;
closed-form correctness (e.g. ConstantStep zero-weight update equals lr·dᵀx;
LinearStep matches its known response formula);
dw_min > 0 still uses the stochastic path (no regression);
convergence — averaged stochastic updates stay within the dw_min scale of the IG result;
CPU/GPU numerical consistency across all device families;
write-noise smoke tests and performance / memory benchmarks vs. the stochastic path.

Built with make build_inplace_cuda (CUDA 12.9 + MKL, Python 3.10).

…zero Signed-off-by: Zhaoxian Wu <wuzhaoxian97@gmail.com>

feat(simulator): support infinite-granularity updates when dw_min is …

b4e31f6

…zero Signed-off-by: Zhaoxian Wu <wuzhaoxian97@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(simulator): support infinite-granularity updates (dw_min=0)#778

feat(simulator): support infinite-granularity updates (dw_min=0)#778
Zhaoxian-Wu wants to merge 1 commit into
IBM:masterfrom
Zhaoxian-Wu:feat/dw-min-zero-infinite-granularity

Zhaoxian-Wu commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Zhaoxian-Wu commented Jun 20, 2026

Background

What it does

Usage

Diagnostic use case: isolating the dw_min contribution

Key changes

Test coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Diagnostic use case: isolating the `dw_min` contribution