Skip to content

feat(simulator): support infinite-granularity updates (dw_min=0)#778

Open
Zhaoxian-Wu wants to merge 1 commit into
IBM:masterfrom
Zhaoxian-Wu:feat/dw-min-zero-infinite-granularity
Open

feat(simulator): support infinite-granularity updates (dw_min=0)#778
Zhaoxian-Wu wants to merge 1 commit into
IBM:masterfrom
Zhaoxian-Wu:feat/dw-min-zero-infinite-granularity

Conversation

@Zhaoxian-Wu

Copy link
Copy Markdown

Background

AIHWKit can simulate various of hardware non-idealities, like limited state (dw_min), response function, variation, or noise.
For diagnostic purposes it is often necessary to isolate these effects. Concretely, a user studying device behavior wants to answer "how much of this error/degradation comes from finite dw_min granularity, versus from the response function or from variation?"
Today there is no clean way to remove only the dw_min contribution: shrinking dw_min toward zero makes the pulse count diverge, which is both prohibitively slow and dominated by accumulated stochastic noise — so the dw_min → 0 limit is never actually reachable.

This PR makes that limit a first-class, exactly-computed mode. Setting dw_min = 0 decouples the impact of dw_min from the other update non-idealities (response function and variation), so a user can compare runs with finite dw_min against the ideal zero-granularity baseline while keeping every other device characteristic fixed.

What it does

Setting dw_min = 0 on any PulsedDevice activates infinite-granularity (IG) mode: instead of simulating a stochastic pulse train, the tile applies the exact mean-field limit of the update in a single deterministic step. The result is a noise-free update that still respects bounds and the device's weight-dependent response, but carries no dw_min granularity and no dw_min-related variation.

In IG mode the per-coincidence stochastic update is replaced by its expectation:

w ← w − lr · (dᵀx) · q(w)

where q(w) is the device-specific response function (the weight-dependent scale normally applied per pulse coincidence), and x / d are the forward and backward signals.

To keep q(w) intact while removing only the granularity:

  • cycle-to-cycle noise and dw_min-related device-to-device variation (dw_min_dtod, dw_min_std) are dropped;
  • all other variation — bounds, gamma, slopes, up/down asymmetry — is preserved;
  • populate() treats dw_min = 0 as unit response (effective_dw_min = 1) so the per-element scales encode q(w) directly;
  • because there is no longer a stochastic pulse train, the pulse-count / bit-length parameters of UpdateParametersdesired_bl, update_bl_management, update_management, and fixed_bl — are bypassed and have no effect in IG mode (the dispatch returns before the bit-line maker is ever invoked).

Usage

IG mode is opt-in through a single config field — set dw_min = 0 on any pulsed device; no other API change is needed. It works the same at the layer level and the tile level:

import torch
from aihwkit.nn import AnalogLinear
from aihwkit.optim import AnalogSGD
from aihwkit.simulator.configs import SingleRPUConfig
from aihwkit.simulator.configs.devices import ConstantStepDevice

# The only switch: dw_min = 0 activates infinite-granularity (IG) mode.
rpu_config = SingleRPUConfig(device=ConstantStepDevice(dw_min=0.0))

layer = AnalogLinear(4, 5, bias=False, rpu_config=rpu_config)
opt = AnalogSGD(layer.parameters(), lr=0.1)

opt.zero_grad()
loss = ((layer(torch.randn(3, 4)) - torch.randn(3, 5)) ** 2).sum()
loss.backward()
opt.step()   # deterministic, noise-free mean-field update

Diagnostic use case: isolating the dw_min contribution

Run the same device at two granularities while keeping the response function and
variation fixed. The difference is then exactly the effect of finite dw_min:

import torch
from aihwkit.simulator.configs import SingleRPUConfig
from aihwkit.simulator.configs.devices import ConstantStepDevice
from aihwkit.simulator.tiles.analog import AnalogTile

def run_update(device, w0, x, d, lr=0.1):
    tile = AnalogTile(5, 4, SingleRPUConfig(device=device), bias=False)
    tile.tile.set_learning_rate(lr)
    tile.tile.set_weights(w0.clone())
    tile.update(x, d)
    return tile.tile.get_weights()

torch.manual_seed(0)
w0 = torch.randn(5, 4) * 0.1
x, d = torch.randn(3, 4), torch.randn(3, 5)

w_ideal = run_update(ConstantStepDevice(dw_min=0.0),   w0, x, d)  # IG baseline
w_real  = run_update(ConstantStepDevice(dw_min=0.001), w0, x, d)  # finite dw_min

# The gap is purely the dw_min granularity contribution.
print("dw_min granularity error:", (w_real - w_ideal).abs().mean().item())

Note that dw_min = 0 removes only the granularity; variation is still applied. To reach the exact mean-field limit w -= lr * (dᵀx), disable the remaining non-idealities as well:

ideal = ConstantStepDevice(
    dw_min=0.0,                                          # remove granularity
    up_down_dtod=0.0, dw_min_dtod=0.0, dw_min_std=0.0,   # remove variation ...
    w_max_dtod=0.0, w_min_dtod=0.0,                      # ... and bound variation
)
# now a single update equals exactly  w -= lr * (d.T @ x)

Key changes

  1. Dispatch (rpu_weight_updater.cpp) — when the weight granularity is ≤ 0, updateVectorWithDevice routes through the IG path (initUpdateCycledoInfiniteGranularityUpdatefinishUpdateCycle) instead of the stochastic pulsed updater.

  2. Base device (rpu_pulsed_device.{h,cpp}) — new virtual doInfiniteGranularityUpdate(...) with a default ConstantStep-style (weight-independent) implementation, plus the IG_UPDATE_W_LOOP_INNER helper macro. populate() switches to unit response and disables dw_min d-to-d variation when dw_min = 0.

  3. Per-device overrides (CPU .cpp + CUDA .cu) — weight-dependent response for ConstantStep, LinearStep, ExpStep, PowStep, PiecewiseStep, SoftBoundsReference, and PowStepReference devices.

  4. Config docs (configs/devices.py) — documents the dw_min = 0 IG behavior on PulsedDevice.dw_min.

Test coverage

New tests/test_infinite_granularity.py covering:

  • IG mode activates and runs for every supported device family;
  • closed-form correctness (e.g. ConstantStep zero-weight update equals lr·dᵀx;
    LinearStep matches its known response formula);
  • dw_min > 0 still uses the stochastic path (no regression);
  • convergence — averaged stochastic updates stay within the dw_min scale of the IG result;
  • CPU/GPU numerical consistency across all device families;
  • write-noise smoke tests and performance / memory benchmarks vs. the stochastic path.

Built with make build_inplace_cuda (CUDA 12.9 + MKL, Python 3.10).

…zero

Signed-off-by: Zhaoxian Wu <wuzhaoxian97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant