Checklist / 检查清单
Bug Description / Bug 描述
When running GRPO with distributed training and --report_to swanlab, non-main ranks can crash during rollout/vLLM profiling with:
RuntimeError: No active Run. Call swanlab.init() first.
The crash occurs during rollout, around _move_model_to_vllm():
File ".../swift/rlhf_trainers/rollout_mixin.py", line 943, in _fast_infer
self._move_model_to_vllm()
File ".../swift/rlhf_trainers/utils.py", line 626, in profiling_context
if 'swanlab' in trainer.args.report_to and swanlab.get_run() is not None and is_main_process:
RuntimeError: No active Run. Call swanlab.init() first.
How to Reproduce / 如何复现
My training script:
swift rlhf \
--rlhf_type grpo \
--model $MODEL \
--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization 0.4 \
--vllm_tensor_parallel_size 8 \
--vllm_max_model_len 20480 \
--beta 0. \
--num_generations $NUM_GENERATIONS \
--external_plugins reward_plugin.py \
--reward_funcs some_random_reward \
--output_dir $OUTPUT_DIR \
--tuner_type full \
--dataset some_random_jsonl.files \
--dataloader_drop_last true \
--dataloader_persistent_workers true \
--dataloader_num_workers 8 \
--load_from_cache_file false \
--warmup_ratio 0.05 \
--num_train_epochs 4 \
--per_device_train_batch_size 1 \
--gradient_checkpointing true \
--gradient_accumulation_steps 8 \
--learning_rate 5e-7 \
--max_length 18000 \
--max_completion_length 2048 \
--save_steps 100 \
--save_total_limit 999 \
--save_only_model true \
--logging_steps 5 \
--report_to swanlab tensorboard \
--eval_strategy no \
--eval_steps 100 \
--deepspeed zero3
Additional Information / 补充信息
swift version:
[INFO:swift] Start time of running main: 2026-06-12 15:54:39.049291
[INFO:swift] swift.version: 4.4.0.dev0
should be the latest code from main
Checklist / 检查清单
Bug Description / Bug 描述
When running GRPO with distributed training and
--report_to swanlab, non-main ranks can crash during rollout/vLLM profiling with:The crash occurs during rollout, around _move_model_to_vllm():
How to Reproduce / 如何复现
My training script:
Additional Information / 补充信息
swift version:
[INFO:swift] Start time of running main: 2026-06-12 15:54:39.049291
[INFO:swift] swift.version: 4.4.0.dev0
should be the latest code from main