SyntheticSSL (CVPR’26)

Official pytorch implementation of out paper:

How Far Can We Go With Synthetic Data for Audio-Visual Sound Source Localization?

Arda Senocak*, Sooyoung Park*, Tae-Hyun Oh ,Joon Son Chung (* Equal Contribution)

CVPR 2026

Introduction

This repository contains the implementation of SyntheticSSL. Our code is based on the Audio-Grounded Contrastive Learning (ACL) framework, adapted for synthetic data experiments.

Required packages

Python = 3.12
Pytorch = 2.5.1
transformers = 4.46.3

Installation

$ conda install python=3.12
$ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
$ pip install transformers==4.46.3
$ pip install tensorboard opencv-python scikit-learn

Data preparation

Important Note: All audio samples must be converted to 16kHz, and for detailed instructions, refer to the readme in each dataset-specific directory.

Dataset
- VGG-Sound: [Link]
  - VGG-SS: [Link]
- AVSBench: [Link]
- VPO: [Link]
- IS4: [Link]
  - Referred to as IS3 on the linked page.
- VGGSynth1 [Link]
  - Synthetic VGGSound Clone1
- VGGSynth2 [Link]
  - Synthetic VGGSound Clone2

Model preparation

Downloading pretrained model (audio backbone) in pretrain folder

BEATs: https://github.com/microsoft/unilm/tree/master/beats
- BEATs_iter3_plus_AS2M_finedtuned_on_AS2M_cpt2.pt

Training

Ensure that you check the .sh files and set the $ export CUDA_VISIBLE_DEVICES=”**” according to your hardware setup.
Make sure that —model_name corresponds to the configuration file located at ./config/model/{-model_name}.yaml.
Model files (.pth) will be saved in the directory {—save_path}/Train_record/{-model_name}_{-exp_name}/.
Review the configuration settings in ./config/train/{-train_config}.yaml to ensure they match your training requirements.
Choose one of the following methods to initiate training:

$ sh SingleGPU_Experiment.sh. # For single GPU setup
$ sh Distributed_Experiment.sh. # For multi-GPU setup (DDP)

Test

Before testing, please review the .sh file and set the $ export CUDA_VISIBLE_DEVICES=”**” environment variable according to your hardware configuration.
Ensure that the —model_name parameter corresponds to the configuration file located at ./config/model/{-model_name}.yaml.
Model files (.pth) located in the directory {—save_path}/{-model_name}_{-exp_name}/Param_{-epochs}.pth will be used for testing.
The —epochs parameter can accept either an integer or a list of integers (e.g., 1, 2, 3).
If —epochs is left unspecified (null), the default model file {—save_path}/Train_record/{-model_name}_{-exp_name}/Param_best.pth will be used for testing.

$ sh Test_PTModels

Pretrained models

Important Note: After downloading the Param_best.pth file, move it to the directory {—save_path}/{-model_name}_{-exp_name}/ before use.

(Syn1_Image, Syn1_Audio): [Link]
- All images and audio from Synthetic VGGSound Clone1
(Syn1_Image, Real_Audio): [Link]
- All images from Synthetic VGGSound Clone1
3x scale (Table 3 (C)) [Link]
- (Real_Image, Real_Audio) $\cup$ (Syn1_Image, Real_Audio) $\cup$ (Syn2_Image, Real_Audio):

Citation

If you use this project, please cite this project as:

@inproceedings{senocak2026howfar,
      title={How Far Can We Go With Synthetic Data for Audio-Visual Sound Source Localization?},
      author={Senocak, Arda and Park, Sooyoung and Oh, Tae-Hyun and Chung, Joon Son}, 
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
AVSBench		AVSBench
IS4		IS4
VGGSS		VGGSS
VPO		VPO
asset		asset
config		config
modules		modules
pretrain		pretrain
Distributed_Experiment.sh		Distributed_Experiment.sh
Eval.py		Eval.py
README.md		README.md
SingleGPU_Experiment.sh		SingleGPU_Experiment.sh
Test_PTModels.py		Test_PTModels.py
Test_PTModels.sh		Test_PTModels.sh
Train_ACL.py		Train_ACL.py
Train_ACL_3x.py		Train_ACL_3x.py
Train_ACL_Syn1.py		Train_ACL_Syn1.py
Train_ACL_Syn1I.py		Train_ACL_Syn1I.py
util.py		util.py
viz_utils.py		viz_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyntheticSSL (CVPR’26)

Introduction

Required packages

Installation

Data preparation

Model preparation

Training

Test

Pretrained models

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SyntheticSSL (CVPR’26)

Introduction

Required packages

Installation

Data preparation

Model preparation

Training

Test

Pretrained models

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages