Skip to content

feat(nkigen): import NKIgen as workspace member with knob/tile/memspace fixes#61

Closed
ymwangg wants to merge 8 commits into
mainfrom
fix/nkigen-builder-knob-tile-and-memspace
Closed

feat(nkigen): import NKIgen as workspace member with knob/tile/memspace fixes#61
ymwangg wants to merge 8 commits into
mainfrom
fix/nkigen-builder-knob-tile-and-memspace

Conversation

@ymwangg

@ymwangg ymwangg commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Import NKIgen (NumPy-trace → NISA-dialect lowering frontend) as a top-level workspace member, replacing the kernelgen/ subfolder previously proposed in feat(kernelgen): import NKIPyKernelGen as a subfolder #55
  • Add compile_to_neff API and upgrade to nki 0.4.0 compile_mlir_to_neff signature
  • Fix knob annotate() mem_space values (align to 1-based C++ dialect) and TileOp emission (loop_tile_size kwarg)
  • Fix test parallelism: partition Neuron cores across xdist workers, clean stale artifacts

Details

This branch brings NKIgen into the nkipy monorepo as a workspace member (nkigen/). Key changes:

  • Package rename: nkipy_kernelgennkigen, wired into uv workspace
  • Dropped internal scaffolding: compiler_explorer, .claude/skills, scratch dirs, BIR-sim mode, LIT tests
  • Hardware-aware constants: HardwareConstants.h uses per-target values via --target= option
  • 26-stage compilation pipeline documented in README with phase groupings
  • Bug fixes: mem_space 0-based→1-based alignment, TileOp kwarg correction, nki 0.4.0 API migration, test parallelism with core partitioning

Test plan

  • uv run pytest nkigen/tests/unit/ -n auto — unit tests pass
  • uv run pytest nkigen/tests/passes/ -n auto — pass-level tests pass
  • uv run pytest nkigen/tests/e2e/ -n auto — end-to-end tests pass on Neuron hardware

🤖 Generated with Claude Code

xiangelec and others added 8 commits May 20, 2026 23:20
NKIgen is the NumPy-trace -> NISA-dialect lowering frontend (formerly
"NKIPyKernelGen"). Imported as a top-level workspace member of the nkipy
monorepo, replacing the kernelgen/ subfolder previously proposed in #55.

Addresses review feedback on #55:

- Drop compiler_explorer/, .claude/skills/, internal scratch dirs
  (issues/, docs/, tools/kernel_agent/, docker/, qwen3_embedding example,
  Brazil Config). Compiler Explorer wrapper will land separately as a
  top-level component shared with other backends.
- Drop the duplicated nkipy tests under tests/e2e/nkipy_tests/; the
  top-level /tests folder is the single source of truth.
- Drop BIR-sim Mode and tests; device is the golden, matching how
  nkipy's main backend tests are structured.
- Rename Python package nkipy_kernelgen -> nkigen, harness decorator
  nkipy_kernelgen_test -> nkigen_test.
- Make HardwareConstants.h target-aware: replace global
  MAX_PARTITION_DIM / MAX_FREE_DIM_MATMUL with
  getSbufNumPartitions(target) / getMatmulFreeDimTileCap(target).  The
  per-target values mirror nki.backends.mlir_tracer.target_info in the
  public nki Python wheel.  infer-layout, canonicalize-partition-dim,
  and legalize-layout each take a --target= option (default trn2).
- Move project metadata into pyproject.toml; setup.py is now just the
  CMake build extension.  Wire nkigen into the top-level uv workspace
  via pyproject.toml [tool.uv.workspace] members and [tool.uv.sources].
- Drop tests/debug, tests/outputs, tests/python (LIT) — internal
  scaffolding that doesn't belong in the open-source merge.
Expand the README into a full reference covering the knob API, the
nkipy dialect, the 26-stage compilation pipeline, IR inspection, and
project structure. Remove the now-stale nkigen/CLAUDE.md.
Reorganize the compilation pipeline section around what each phase
realizes: Phase 1 IR Preparation (1-6, rewrites + layout inference),
Phase 2 Tiling (tile_op, 7-8), Phase 3 Loop Fusion (fuse_op, 9),
Phase 4 Layout (layout knob, 10-20, with bufferization as its prep),
Phase 5 Scheduling (21-25). Reframe pass 26 as a pluggable backend
(today: in-memory NISA bindings; planned: editable Python NKI source).
Restores the compile_to_neff() function from NKIPyKernelGen commit
7bbd0ca, updated for the nkigen package rename. This module
encapsulates the full MLIR pass pipeline + NKI compilation into a
single call.
nki 0.4.0 changed compile_mlir_to_neff from a 6-arg signature
(module, function_name, input_arrays, argument_names, output_arg_names,
compile_opts) to a 3-arg form (module, function_name, compile_opts) that
derives argument/output names from BIR internally.

Update harness.py and compile.py to use the new API. Add pyyaml dev
dependency needed by MLIR Python bindings and pytest-xdist for parallel
test execution.
- Clean neff/ artifact dir before compilation to prevent neuronx-cc
  from failing on stale artifacts from previous runs
- Partition Neuron cores across xdist workers (NEURON_LOGICAL_NC_CONFIG=1
  gives 128 cores on trn2, one per worker) to prevent NRT init races
- Cap --maxprocesses=128 to match available core count
Two bugs in builder.py's annotate() function:

1. ms_map used 0-based values (Hbm=0, Psum=1, Sbuf=2, SharedHbm=3) but
   the C++ dialect defines them starting at 1 (NkipyAttrs.td). This caused
   annotate-memory-space pass to fail with type mismatches on return ops.

2. TileOp was called with a non-existent `reduction_tile` kwarg. The
   dialect only supports `loop_tile_size` which should contain the full
   iteration space (e.g. [M, N, K] for matmul). Now merges tile_size and
   reduction_tile into a single loop_tile_size array before emission.
@ymwangg ymwangg requested a review from a team June 4, 2026 17:16
@ymwangg ymwangg closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants