AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit

AUDDT is a benchmark toolkit for audio deepfake detection. The landscape of audio deepfake detection is fragmented with numerous datasets, each having its own data format and evaluation protocol. AUDDT addresses this by providing a unified platform to seamlessly benchmark pretrained models against a wide variety of public datasets. We make a dedicated effort to update it regularly to include more recent datasets. Please see below for current coverage.

The current version includes 33 datasets.

Supported Datasets

The full list of 30+ supported datasets is maintained in a public Google Sheet for easy viewing and filtering.

➡️ View Full Dataset List on Google Sheets

Update Log

We are actively developing AUDDT. See below for the latest updates.

2026-05-20
- Added multi-GPU inference support
- Fixed metadata labels for a few datasets
2026-05-18
- Added deepfake audio event datasets: FakeSound, VCapAV, and EnvSSD.
- Added more subgroups for easy selective benchmarking
2025-09-19
- Birth of AUDDT
- Added 28 datasets to the benchmark
- Added an examplar baseline model

Installation

Clone the repository:

git clone https://github.com/MUSAELab/AUDDT.git
cd AUDDT

Create a virtual environment and install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Benchmarking Your Detector

All commands below assume you are in the project root (AUDDT/) with your environment activated.

Step 1 — Download datasets

Set the data root in download/config.sh (default: data/). Then run the corresponding download script for each dataset you need:

bash download/get_XXX.sh    # e.g., bash download/get_fakesound.sh

By default, data is placed under:

data/
  DATASET_X/
    raw/          # compressed archives downloaded from source
    processed/    # extracted audio files

Step 2 — Prepare manifests

Generate CSV manifest files for all configured datasets:

python preprocessing/prep_all_datasets.py --config preprocessing/dataset_list.yaml

Enable or disable individual datasets by commenting them in preprocessing/dataset_list.yaml. Each enabled dataset must already be downloaded before this step.

Step 3 — Configure your model

Place your model script in models/ and edit benchmark/evaluate_setup.yaml:

model:
  path: models/detector_wrapper.py
  class_name: AudioDeepfakeDetector
  checkpoint: models/Best_LA_model_for_DF.pth
  device: 'cuda:0'
  model_args:
    raw_model_path: models/baseline_model.py
    raw_model_class_name: Model
    raw_model_args:
      args: null
      model_device: 'cuda:0'

Comparing against the exemplar baseline? baseline_model.py depends on fairseq, which requires Python 3.10 (incompatible with 3.12+). Set up a dedicated environment first:
conda create -n auddt python=3.10 -y
conda activate auddt
pip install -r requirements.txt
The XLSR-300M backbone weights also need to be present in the project root:
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr2_300m.pt
Verify everything loads before running full evaluation:
python benchmark/smoke_test.py

Step 4 — Select datasets to evaluate

By default, evaluation runs on all datasets whose manifests exist. To evaluate on a specific subset, define a named group in benchmark/dataset_group.yaml:

audio-events:
  - name: FakeSound
    manifest_path: fakesound/processed/manifest_fakesound.csv
  - name: EnvSDD
    manifest_path: envssd/processed/manifest_envssd.csv
  - name: VCapAV-dev1
    manifest_path: vcapav/processed/dev1.csv
  - name: VCapAV-dev2
    manifest_path: vcapav/processed/dev2.csv
  - name: VCapAV-dev3
    manifest_path: vcapav/processed/dev3.csv

Then set group_name accordingly in benchmark/evaluate_setup.yaml:

data:
  group_name: audio-events
  groups_config_path: benchmark/dataset_group.yaml

Step 5 — Run evaluation

python -m benchmark.evaluate --config benchmark/evaluate_setup.yaml

Results are written to results/. If latex_output_path is set in the config, a .tex table is also generated automatically.

You may need to adjust batch_size in evaluate_setup.yaml based on available GPU memory.

Example Output

Running the exemplar AASIST-style baseline (XLSR-300M + RawNet2 + GAT, trained on ASVspoof LA) on the audio-events group produces the following. Note that near-random EER on audio-event datasets is expected — this model was trained on speech deepfakes and does not generalize to out-of-domain audio events.

The LaTeX table is saved to results/examplar_table.tex:

% Required packages: \usepackage{booktabs}
\begin{table*}[htbp]
  \centering
  \caption{Evaluation Results}
  \label{tab:results}
\begin{tabular}{lrrrrrrrrrrr}
\toprule
 & EER (\%) & AUC & Acc (\%) & TPR (\%) & TNR (\%) & Pre (\%) & F1 & TP & TN & FP & FN \\
\midrule
FakeSound & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A & N/A \\
EnvSDD & 42.70 & 0.5902 & 32.70 & 25.63 & 82.02 & 90.87 & 0.3999 & 637 & 292 & 64 & 1848 \\
VCapAV-dev1 & 57.79 & 0.4495 & 37.36 & 30.74 & 57.23 & 68.32 & 0.4240 & 3480 & 2160 & 1614 & 7842 \\
VCapAV-dev2 & 51.46 & 0.5060 & 44.45 & 38.31 & 56.73 & 63.91 & 0.4791 & 2892 & 2141 & 1633 & 4656 \\
VCapAV-dev3 & 54.80 & 0.4691 & 37.67 & 33.86 & 56.73 & 79.65 & 0.4752 & 6390 & 2141 & 1633 & 12480 \\
\midrule
\textbf{Average} & 51.69 & 0.5037 & 38.05 & 32.14 & 63.18 & 75.69 & 0.4445 & 13399 & 6734 & 4944 & 26826 \\
\bottomrule
\end{tabular}
\end{table*}

FakeSound contains only spoof samples (no bonafide), so EER and AUC are reported as N/A; accuracy reflects spoof detection rate only.

Contributing

While the team will keep updating the benchmark coverage, it is highly encouraged to suggest dataset addition via creating an issue and point us to the source link and paper.

Limitations

The FakeSound dataset included in this benchmark currently contains only spoofed audio samples. Obtaining the corresponding bona fide (real) data requires additional manual steps. We plan to resolve this limitation and provide the complete dataset in future updates.

Citation

@misc{zhu2025auddtaudiounifieddeepfake,
      title={AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit}, 
      author={Yi Zhu and Heitor R. Guimarães and Arthur Pimentel and Tiago Falk},
      year={2025},
      eprint={2509.21597},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2509.21597}
}

License

This project is licensed for academic and research use only.
Commercial use is strictly prohibited without prior written permission.
See the full LICENSE file for details.

Disclaimer

We do not include any proprietary datasets or the ones with unknown sources for transparency. We also encourage users to be careful with the potential training/test overlap, e.g., some datasets like ASVspoof2019 / ASVspoof5 are widely used as training sets. Results obtained with this toolkit should solely be used for research purposes instead of advertisement for commercial usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit

Table of Contents

Supported Datasets

Update Log

Installation

Benchmarking Your Detector

Step 1 — Download datasets

Step 2 — Prepare manifests

Step 3 — Configure your model

Step 4 — Select datasets to evaluate

Step 5 — Run evaluation

Example Output

Contributing

Limitations

Citation

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
assets		assets
benchmark		benchmark
download		download
models		models
preprocessing		preprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit

Table of Contents

Supported Datasets

Update Log

Installation

Benchmarking Your Detector

Step 1 — Download datasets

Step 2 — Prepare manifests

Step 3 — Configure your model

Step 4 — Select datasets to evaluate

Step 5 — Run evaluation

Example Output

Contributing

Limitations

Citation

License

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages