feat(nkigen): import NKIgen as workspace member with knob/tile/memspace fixes#61
Closed
ymwangg wants to merge 8 commits into
Closed
feat(nkigen): import NKIgen as workspace member with knob/tile/memspace fixes#61ymwangg wants to merge 8 commits into
ymwangg wants to merge 8 commits into
Conversation
NKIgen is the NumPy-trace -> NISA-dialect lowering frontend (formerly "NKIPyKernelGen"). Imported as a top-level workspace member of the nkipy monorepo, replacing the kernelgen/ subfolder previously proposed in #55. Addresses review feedback on #55: - Drop compiler_explorer/, .claude/skills/, internal scratch dirs (issues/, docs/, tools/kernel_agent/, docker/, qwen3_embedding example, Brazil Config). Compiler Explorer wrapper will land separately as a top-level component shared with other backends. - Drop the duplicated nkipy tests under tests/e2e/nkipy_tests/; the top-level /tests folder is the single source of truth. - Drop BIR-sim Mode and tests; device is the golden, matching how nkipy's main backend tests are structured. - Rename Python package nkipy_kernelgen -> nkigen, harness decorator nkipy_kernelgen_test -> nkigen_test. - Make HardwareConstants.h target-aware: replace global MAX_PARTITION_DIM / MAX_FREE_DIM_MATMUL with getSbufNumPartitions(target) / getMatmulFreeDimTileCap(target). The per-target values mirror nki.backends.mlir_tracer.target_info in the public nki Python wheel. infer-layout, canonicalize-partition-dim, and legalize-layout each take a --target= option (default trn2). - Move project metadata into pyproject.toml; setup.py is now just the CMake build extension. Wire nkigen into the top-level uv workspace via pyproject.toml [tool.uv.workspace] members and [tool.uv.sources]. - Drop tests/debug, tests/outputs, tests/python (LIT) — internal scaffolding that doesn't belong in the open-source merge.
Expand the README into a full reference covering the knob API, the nkipy dialect, the 26-stage compilation pipeline, IR inspection, and project structure. Remove the now-stale nkigen/CLAUDE.md.
Reorganize the compilation pipeline section around what each phase realizes: Phase 1 IR Preparation (1-6, rewrites + layout inference), Phase 2 Tiling (tile_op, 7-8), Phase 3 Loop Fusion (fuse_op, 9), Phase 4 Layout (layout knob, 10-20, with bufferization as its prep), Phase 5 Scheduling (21-25). Reframe pass 26 as a pluggable backend (today: in-memory NISA bindings; planned: editable Python NKI source).
Restores the compile_to_neff() function from NKIPyKernelGen commit 7bbd0ca, updated for the nkigen package rename. This module encapsulates the full MLIR pass pipeline + NKI compilation into a single call.
nki 0.4.0 changed compile_mlir_to_neff from a 6-arg signature (module, function_name, input_arrays, argument_names, output_arg_names, compile_opts) to a 3-arg form (module, function_name, compile_opts) that derives argument/output names from BIR internally. Update harness.py and compile.py to use the new API. Add pyyaml dev dependency needed by MLIR Python bindings and pytest-xdist for parallel test execution.
- Clean neff/ artifact dir before compilation to prevent neuronx-cc from failing on stale artifacts from previous runs - Partition Neuron cores across xdist workers (NEURON_LOGICAL_NC_CONFIG=1 gives 128 cores on trn2, one per worker) to prevent NRT init races - Cap --maxprocesses=128 to match available core count
Two bugs in builder.py's annotate() function: 1. ms_map used 0-based values (Hbm=0, Psum=1, Sbuf=2, SharedHbm=3) but the C++ dialect defines them starting at 1 (NkipyAttrs.td). This caused annotate-memory-space pass to fail with type mismatches on return ops. 2. TileOp was called with a non-existent `reduction_tile` kwarg. The dialect only supports `loop_tile_size` which should contain the full iteration space (e.g. [M, N, K] for matmul). Now merges tile_size and reduction_tile into a single loop_tile_size array before emission.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
compile_to_neffAPI and upgrade to nki 0.4.0compile_mlir_to_neffsignatureannotate()mem_space values (align to 1-based C++ dialect) and TileOp emission (loop_tile_sizekwarg)Details
This branch brings NKIgen into the nkipy monorepo as a workspace member (
nkigen/). Key changes:nkipy_kernelgen→nkigen, wired intouvworkspaceHardwareConstants.huses per-target values via--target=optionTest plan
uv run pytest nkigen/tests/unit/ -n auto— unit tests passuv run pytest nkigen/tests/passes/ -n auto— pass-level tests passuv run pytest nkigen/tests/e2e/ -n auto— end-to-end tests pass on Neuron hardware🤖 Generated with Claude Code