Skip to content

feat: add automatic device detection for non-CUDA backends#559

Open
curnane-lab wants to merge 1 commit into
sgl-project:mainfrom
curnane-lab:feat/device-auto-detect
Open

feat: add automatic device detection for non-CUDA backends#559
curnane-lab wants to merge 1 commit into
sgl-project:mainfrom
curnane-lab:feat/device-auto-detect

Conversation

@curnane-lab
Copy link
Copy Markdown

@curnane-lab curnane-lab commented May 25, 2026

Motivation

The DFlash training script (scripts/train_dflash.py) currently hardcodes "cuda" as the device type, making it impossible to run on Ascend NPU or CPU without manual code changes. This PR adds automatic device detection so that SpecForge can run on CUDA, Ascend NPU, or CPU out of the box, improving portability for users on non-NVIDIA hardware.

Modifications

  • specforge/utils.py:
    • Add get_device_type(): auto-detects device via SPECFORGE_DEVICE env → torch.cudatorch.npucpu
    • Add get_local_device(): returns torch.device bound to LOCAL_RANK
  • scripts/train_dflash.py:
    • Replace 5 occurrences of hardcoded .cuda() / device="cuda" with dynamic get_device_type() / get_local_device()
    • Use .to(device, non_blocking=True) for tensor movement to support both CUDA and NPU

Related Issues

N/A (new feature)

Accuracy Test

  • Not applicable — no model architecture or kernel changes; this is a device-portability refactor.

Benchmark & Profiling

  • No performance impact expected; only changes device placement logic.
  • Verified backward compatibility: CUDA remains the default when both CUDA and NPU are available.

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Add get_device_type() to auto-detect npu/cuda/cpu via runtime check
- Add get_local_device() to return torch.device for current LOCAL_RANK
- Replace hardcoded .cuda() and device='cuda' in train_dflash.py with
  dynamic device selection
- Use .to(device, non_blocking=True) for tensor movement to support
  both CUDA and Ascend NPU without code changes
- Maintain backward compatibility: CUDA remains default when available
@curnane-lab curnane-lab force-pushed the feat/device-auto-detect branch from df9c492 to 2b9f350 Compare May 25, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants