Skip to content

Linalg/XeGPU nanoGPT forward-pass example#196

Draft
nbpatel wants to merge 6 commits into
llvm:mainfrom
nbpatel:nanoGPT
Draft

Linalg/XeGPU nanoGPT forward-pass example#196
nbpatel wants to merge 6 commits into
llvm:mainfrom
nbpatel:nanoGPT

Conversation

@nbpatel

@nbpatel nbpatel commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This PR adds a nanoGPT model script under examples/xegpu, a full GPT-2-style forward pass (6 layers, C=256, H=4, T=256) running end-to-end on the Intel GPU via XeGPU lowering path, with multi-head attention computed by a single fused flash-attention kernel per block (online-softmax). The model lowers to one MLIR module; verified against a numpy reference.

Architecture follows the GPT-2 block stack from Andrej Karpathy's nanoGPT, adapted here for testing XeGPU path

Assisted by Claude Code

nbpatel and others added 2 commits June 17, 2026 20:33
The example was discovered by lit but had no RUN line, causing the
pre-commit suite to report UNRESOLVED ("Test has no 'RUN:' line").
Add a RUN line that dumps the xegpu-wg IR (1 layer, host-side, no GPU)
and FileChecks the module header, matching the sibling xegpu examples.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tkarna

tkarna commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Neat, thanks for contributing this! The file is rather long, maybe it'd be clearer if split into pieces, e.g., separate file for the main nanoGPT.py, payload generation (nanoGPT_payload.py?), and the schedule in the same directory.

@nbpatel

nbpatel commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Neat, thanks for contributing this! The file is rather long, maybe it'd be clearer if split into pieces, e.g., separate file for the main nanoGPT.py, payload generation (nanoGPT_payload.py?), and the schedule in the same directory.

makes sense. Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants