Skip to content

MuSAELab/AUDDT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit

License Python Paper Datasets

AUDDT is a benchmark toolkit for audio deepfake detection. The landscape of audio deepfake detection is fragmented with numerous datasets, each having its own data format and evaluation protocol. AUDDT addresses this by providing a unified platform to seamlessly benchmark pretrained models against a wide variety of public datasets. We make a dedicated effort to update it regularly to include more recent datasets. Please see below for current coverage.

The current version includes 33 datasets.

AUDDT Workflow Diagram

Table of Contents

Supported Datasets

The full list of 30+ supported datasets is maintained in a public Google Sheet for easy viewing and filtering.

➡️ View Full Dataset List on Google Sheets

Update Log

We are actively developing AUDDT. See below for the latest updates.

  • 2026-05-20
    • Added multi-GPU inference support
    • Fixed metadata labels for a few datasets
  • 2026-05-18
    • Added deepfake audio event datasets: FakeSound, VCapAV, and EnvSSD.
    • Added more subgroups for easy selective benchmarking
  • 2025-09-19
    • Birth of AUDDT
    • Added 28 datasets to the benchmark
    • Added an examplar baseline model

Installation

  1. Clone the repository:

    git clone https://github.com/MUSAELab/AUDDT.git
    cd AUDDT
  2. Create a virtual environment and install dependencies:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt

Benchmarking Your Detector

All commands below assume you are in the project root (AUDDT/) with your environment activated.

Step 1 — Download datasets

Set the data root in download/config.sh (default: data/). Then run the corresponding download script for each dataset you need:

bash download/get_XXX.sh    # e.g., bash download/get_fakesound.sh

By default, data is placed under:

data/
  DATASET_X/
    raw/          # compressed archives downloaded from source
    processed/    # extracted audio files

Step 2 — Prepare manifests

Generate CSV manifest files for all configured datasets:

python preprocessing/prep_all_datasets.py --config preprocessing/dataset_list.yaml

Enable or disable individual datasets by commenting them in preprocessing/dataset_list.yaml. Each enabled dataset must already be downloaded before this step.

Step 3 — Configure your model

Place your model script in models/ and edit benchmark/evaluate_setup.yaml:

model:
  path: models/detector_wrapper.py
  class_name: AudioDeepfakeDetector
  checkpoint: models/Best_LA_model_for_DF.pth
  device: 'cuda:0'
  model_args:
    raw_model_path: models/baseline_model.py
    raw_model_class_name: Model
    raw_model_args:
      args: null
      model_device: 'cuda:0'

Comparing against the exemplar baseline? baseline_model.py depends on fairseq, which requires Python 3.10 (incompatible with 3.12+). Set up a dedicated environment first:

conda create -n auddt python=3.10 -y
conda activate auddt
pip install -r requirements.txt

The XLSR-300M backbone weights also need to be present in the project root:

wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr2_300m.pt

Verify everything loads before running full evaluation:

python benchmark/smoke_test.py

Step 4 — Select datasets to evaluate

By default, evaluation runs on all datasets whose manifests exist. To evaluate on a specific subset, define a named group in benchmark/dataset_group.yaml:

audio-events:
  - name: FakeSound
    manifest_path: fakesound/processed/manifest_fakesound.csv
  - name: EnvSDD
    manifest_path: envssd/processed/manifest_envssd.csv
  - name: VCapAV-dev1
    manifest_path: vcapav/processed/dev1.csv
  - name: VCapAV-dev2
    manifest_path: vcapav/processed/dev2.csv
  - name: VCapAV-dev3
    manifest_path: vcapav/processed/dev3.csv

Then set group_name accordingly in benchmark/evaluate_setup.yaml:

data:
  group_name: audio-events
  groups_config_path: benchmark/dataset_group.yaml

Step 5 — Run evaluation

python -m benchmark.evaluate --config benchmark/evaluate_setup.yaml

Results are written to results/. If latex_output_path is set in the config, a .tex table is also generated automatically.

You may need to adjust batch_size in evaluate_setup.yaml based on available GPU memory.

Example Output

Running the exemplar AASIST-style baseline (XLSR-300M + RawNet2 + GAT, trained on ASVspoof LA) on the audio-events group produces the following. Note that near-random EER on audio-event datasets is expected — this model was trained on speech deepfakes and does not generalize to out-of-domain audio events.

The LaTeX table is saved to results/examplar_table.tex:

% Required packages: \usepackage{booktabs}
\begin{table*}[htbp]
  \centering
  \caption{Evaluation Results}
  \label{tab:results}
\begin{tabular}{lrrrrrrrrrrr}
\toprule
 & EER (\%) & AUC & Acc (\%) & TPR (\%) & TNR (\%) & Pre (\%) & F1 & TP & TN & FP & FN \\
\midrule
FakeSound & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A \\
EnvSDD & 42.70 & 0.5902 & 32.70 & 25.63 & 82.02 & 90.87 & 0.3999 & 637 & 292 & 64 & 1848 \\
VCapAV-dev1 & 57.79 & 0.4495 & 37.36 & 30.74 & 57.23 & 68.32 & 0.4240 & 3480 & 2160 & 1614 & 7842 \\
VCapAV-dev2 & 51.46 & 0.5060 & 44.45 & 38.31 & 56.73 & 63.91 & 0.4791 & 2892 & 2141 & 1633 & 4656 \\
VCapAV-dev3 & 54.80 & 0.4691 & 37.67 & 33.86 & 56.73 & 79.65 & 0.4752 & 6390 & 2141 & 1633 & 12480 \\
\midrule
\textbf{Average} & 51.69 & 0.5037 & 38.05 & 32.14 & 63.18 & 75.69 & 0.4445 & 13399 & 6734 & 4944 & 26826 \\
\bottomrule
\end{tabular}
\end{table*}

FakeSound contains only spoof samples (no bonafide), so EER and AUC are reported as N/A; accuracy reflects spoof detection rate only.

Contributing

While the team will keep updating the benchmark coverage, it is highly encouraged to suggest dataset addition via creating an issue and point us to the source link and paper.

Limitations

  1. The FakeSound dataset included in this benchmark currently contains only spoofed audio samples. Obtaining the corresponding bona fide (real) data requires additional manual steps. We plan to resolve this limitation and provide the complete dataset in future updates.

Citation

@misc{zhu2025auddtaudiounifieddeepfake,
      title={AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit}, 
      author={Yi Zhu and Heitor R. Guimarães and Arthur Pimentel and Tiago Falk},
      year={2025},
      eprint={2509.21597},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2509.21597}
}

License

This project is licensed for academic and research use only.
Commercial use is strictly prohibited without prior written permission.
See the full LICENSE file for details.

Disclaimer

We do not include any proprietary datasets or the ones with unknown sources for transparency. We also encourage users to be careful with the potential training/test overlap, e.g., some datasets like ASVspoof2019 / ASVspoof5 are widely used as training sets. Results obtained with this toolkit should solely be used for research purposes instead of advertisement for commercial usage.

About

A toolkit for benchmarking on a wide variety of audio deepfake datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors