MDMF (Micro-Defects expose Macro-Fakes) reframes AI-generated image detection as a local distributional problem rather than an image-level classification one. Instead of compressing each image into a single representation that tends to over-rely on global semantics, MDMF treats every image as a collection of patches, projects each patch into a learnable forensic latent space — the Patch Forensic Signature (PFS) — and measures the distributional discrepancy between the test image and a small reference bank of clean real images via a Maximum Mean Discrepancy (MMD) score.
We release the source code, the trained MDMF checkpoint, and the scripts needed to reproduce the main paper results.
- Patch Forensic Signature (PFS) — A learnable forensic reparameterization of frozen DINOv2 patch tokens that suppresses semantic invariances and amplifies generation-induced statistical irregularities.
- MDMF detector — A distribution-aware detection framework that aggregates patch-level evidence into a stable image-level score via MMD between PFS distributions, avoiding the per-patch decision boundary that destabilizes hard-voting baselines.
-
Theory-grounded — We prove that patch-wise PFS modeling yields a provably larger MMD signal than global pooling whenever localized forensic cues are present, and we derive a finite-sample separation guarantee with a finite optimal patch count
$K^\star$ . - Strong cross-generator generalization — Trained on a single 4-class ProGAN split, MDMF reaches 95.65 average AUROC on the ImageNet benchmark (9 generators spanning diffusion, GAN, and AR families) and remains state-of-the-art on LSUN-Bedroom, GenImage, WildRF, LDMFakeDetect, and an OpenSora video-frame stress test, with markedly gentler degradation under JPEG, blur, and noise post-processing than the strongest training-based baseline.
git clone https://github.com/ZBox1005/MDMF.git
cd MDMF
conda create -n mdmf python=3.10 -y
conda activate mdmf
pip install -r requirements.txtThe code targets Python 3.10 + PyTorch 2.0+ on a single GPU. The DINOv2 ViT-L/14 backbone is fetched on first use via torch.hub.
We organize the workflow into three steps: (1) precompute DINOv2 patch embeddings, (2) train the MDMF detector, and (3) evaluate on unseen generators. Each step uses one self-contained script under src/.
Organize datasets following the structure below; the scripts read from these paths via CLI flags.
data/
├── train/
│ ├── 0_real/ # training real images (e.g., LSUN classes used by ProGAN)
│ └── 1_fake/ # training fake images (e.g., 4-class ProGAN)
├── val/
│ ├── 0_real/
│ └── 1_fake/
├── ref/
│ └── 0_real/ # reference real images (3K is enough; 5K for the strongest setting)
└── test/
├── real/
├── adm/ # one folder per test generator
├── ldm/
└── ...
We follow CNNDetection for the training split (4-class ProGAN) and the standard cross-generator evaluation suites for testing (ImageNet, LSUN-Bedroom, GenImage, WildRF, LDMFakeDetect).
# Training pairs
python src/precompute_embeddings.py \
--real_dir data/train/0_real \
--fake_dir data/train/1_fake \
--output embeddings/train_patch32.pkl \
--patch_size 32 --batch_size 256
# Validation pairs
python src/precompute_embeddings.py \
--real_dir data/val/0_real \
--fake_dir data/val/1_fake \
--output embeddings/val_patch32.pkl \
--patch_size 32 --batch_size 256
# Reference bank (real-only)
python src/precompute_embeddings.py \
--real_dir data/ref/0_real \
--output embeddings/ref_3k_patch32.pkl \
--patch_size 32 --batch_size 256Edit config.json to point at the embeddings produced in Step 1, then:
python src/train_MDMF.py \
--config config.json \
--output_dir checkpoints/mdmf_imagenetWe provide the trained checkpoint at checkpoints/model.pth so Step 2 can be skipped.
python src/test_MDMF.py \
--model checkpoints/model.pth \
--ref_embeddings embeddings/ref_3k_patch32.pkl \
--test_real data/test/real \
--test_fake data/test/adm data/test/ldm data/test/biggan ... \
--output_dir results/mdmf_imagenet \
--batch_size 256Pass any number of fake generator directories to --test_fake; the script reports per-generator AUROC, AP, and FPR@95TPR plus the average over all listed generators.
MDMF reaches an average AUROC of 95.65 / AP of 97.07 on the 9-generator ImageNet benchmark, outperforming every training-based detector we compare against. The advantage is consistent across diffusion, GAN, and AR generator families, and the per-generator gains are largest precisely on the hardest diffusion targets (ADM, ADMG, LDM, DiT-XL/2). See the paper for results on LSUN-Bedroom, GenImage, WildRF, LDMFakeDetect, and the OpenSora video-frame stress test.
MDMF/
├── README.md
├── LICENSE
├── requirements.txt
├── config.json # training hyperparameters template
├── src/
│ ├── precompute_embeddings.py # DINOv2 patch-token extraction
│ ├── train_MDMF.py # MDMF training loop (PFS + MMD objective)
│ └── test_MDMF.py # evaluation against reference bank
├── checkpoints/
│ └── model.pth # released MDMF checkpoint
└── assets/
├── logo_full.png
├── framework.png
└── main_table.png
If you find this work useful, please cite:
@article{zhang2026mdmf,
title={Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts},
author={Zhang, Boxuan and Zhu, Jianing and Wang, Qifan and Liu, Jiang and Tang, Ruixiang},
journal={arXiv preprint arXiv:2605.09296},
year={2026}
}Code released under the MIT License. Datasets used in this paper are released under their own original licenses; please follow the terms of the corresponding releases (ImageNet, LSUN-Bedroom, GenImage, WildRF, LDMFakeDetect, OpenSora, MSR-VTT).


