Skip to content

CPU-only sampling fails with CUDA error #40

@AFEScalante

Description

@AFEScalante

TabSyn sampling fails on CPU-only machines despite --gpu -1 flag

Description

I'm unable to run TabSyn sampling on a machine without GPU access, even when explicitly setting --gpu -1 to force CPU usage. The sampling pipeline appears to have hardcoded CUDA dependencies that prevent CPU-only execution.

Steps to Reproduce

  1. Set up tabsyn on a machine without CUDA GPUs
  2. Train a model successfully using python main.py --dataname shoppers --method vae --mode train --gpu -1 and python main.py --dataname shoppers --method tabsyn --mode train --gpu -1
  3. Attempt to generate synthetic data using:
python main.py --dataname shoppers --method tabsyn --mode sample --gpu -1

Expected Behavior

The sampling should run successfully on CPU, generating synthetic data without requiring GPU access.

Actual Behavior

The following error occurs:

No NaNs in numerical features, skipping
Traceback (most recent call last):
  File "/home/angel-escalante/Escritorio/tabsyn/main.py", line 15, in <module>
    main_fn(args)
  File "/home/angel-escalante/Escritorio/tabsyn/tabsyn/sample.py", line 39, in main
    x_next = sample(model.denoise_fn_D, num_samples, sample_dim)
  File "/home/angel-escalante/Escritorio/tabsyn/tabsyn/diffusion_utils.py", line 23, in sample
    latents = torch.randn([num_samples, dim], device=device)
  File "/home/angel-escalante/miniconda3/envs/tabsyn/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

Analysis

I've identified several issues that seem to be causing this problem:

  1. Main entry point: Looking at main.py, it seems like the device detection doesn't properly handle the --gpu -1 case
  2. Device parameter: The sample() function in diffusion_utils.py appears to default to device='cuda:0' and this parameter might not be passed correctly from sample.py
  3. Model loading: The torch.load() calls might be trying to load CUDA tensors without proper CPU mapping

Possible Solution Areas

I think the fix would involve:

  1. Fixing device detection in main.py to properly respect --gpu -1
  2. Ensuring device parameter is passed correctly through the sampling pipeline
  3. Adding proper model loading with map_location='cpu' when needed

Environment

  • OS: Linux (Ubuntu)
  • Python: 3.10
  • PyTorch: CPU-only installation (no CUDA)
  • Hardware: Machine without CUDA GPUs
  • tabsyn: Latest version from main branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions