Skip to content

SaFo-Lab/SafeVL

Repository files navigation

SafeVL

SafeVL is a visual-prompting driving-safety reasoning framework. It uses Grounding DINO + SAM2 to detect and track objects across dashcam frames, then a fine-tuned Qwen2.5-VL-7B to produce 4-step chain-of-thought reasoning and a calibrated safe / unsafe verdict.


Installation

1. Conda environment

conda create -n safevl python=3.10 -y
conda activate safevl
pip install -r requirements.txt

PyTorch must be ≥ 2.6 with CUDA (tested with torch==2.6.0+cu124 on CUDA 12.9 driver). If your torch was installed against a different CUDA, reinstall it to match:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

2. Install SAM2 (via Grounded-SAM-2)

git clone https://github.com/IDEA-Research/Grounded-SAM-2.git
cd Grounded-SAM-2
pip install -e .
cd ..

3. Download the SAM2 checkpoint

Download sam2_hiera_large.pt (or sam2.1_hiera_large.pt) from the SAM2 release page and place it under ./checkpoints/.

4. SafeVL VLM checkpoint

The fine-tuned VLM is on HuggingFace and will download automatically the first time the pipeline runs:

👉 gray311/SafeVL (~16 GB)

To pre-download:

huggingface-cli download gray311/SafeVL --local-dir ./checkpoints/SafeVL

Usage

Open quickstart.ipynb — it walks you through:

  1. Sanity-checking your environment
  2. Initializing SafeVLPipeline (loads SAM2 + Grounding-DINO + Qwen2.5-VL once)
  3. Running on a clip (video file, frame list, or frame directory)
  4. Inspecting the 4-step reasoning and the verdict probabilities
  5. Visualizing the SoM-annotated frames
jupyter notebook quickstart.ipynb

Acknowledgements

About

Official Repo for Paper: SafeVL: Driving Safety Evaluation via Meticulous Reasoning in Vision Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages