feat: add automatic device detection for non-CUDA backends by curnane-lab · Pull Request #559 · sgl-project/SpecForge

curnane-lab · 2026-05-25T06:19:03Z

Motivation

The DFlash training script (scripts/train_dflash.py) currently hardcodes "cuda" as the device type, making it impossible to run on Ascend NPU or CPU without manual code changes. This PR adds automatic device detection so that SpecForge can run on CUDA, Ascend NPU, or CPU out of the box, improving portability for users on non-NVIDIA hardware.

Modifications

specforge/utils.py:
- Add get_device_type(): auto-detects device via SPECFORGE_DEVICE env → torch.cuda → torch.npu → cpu
- Add get_local_device(): returns torch.device bound to LOCAL_RANK
scripts/train_dflash.py:
- Replace 5 occurrences of hardcoded .cuda() / device="cuda" with dynamic get_device_type() / get_local_device()
- Use .to(device, non_blocking=True) for tensor movement to support both CUDA and NPU

Related Issues

N/A (new feature)

Accuracy Test

Not applicable — no model architecture or kernel changes; this is a device-portability refactor.

Benchmark & Profiling

No performance impact expected; only changes device placement logic.
Verified backward compatibility: CUDA remains the default when both CUDA and NPU are available.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.

gemini-code-assist · 2026-05-25T06:19:09Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Add get_device_type() to auto-detect npu/cuda/cpu via runtime check - Add get_local_device() to return torch.device for current LOCAL_RANK - Replace hardcoded .cuda() and device='cuda' in train_dflash.py with dynamic device selection - Use .to(device, non_blocking=True) for tensor movement to support both CUDA and Ascend NPU without code changes - Maintain backward compatibility: CUDA remains default when available

curnane-lab requested review from FlamingoPg, shuaills and sleepcoo as code owners May 25, 2026 06:19

curnane-lab force-pushed the feat/device-auto-detect branch from df9c492 to 2b9f350 Compare May 25, 2026 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add automatic device detection for non-CUDA backends#559

feat: add automatic device detection for non-CUDA backends#559
curnane-lab wants to merge 1 commit into
sgl-project:mainfrom
curnane-lab:feat/device-auto-detect

curnane-lab commented May 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

curnane-lab commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

curnane-lab commented May 25, 2026 •

edited

Loading