Deepfakes Detection

🇰🇷 한국어 버전 | 📈 Model Evaluation | 🔮 Try Demo

📌 Contents

💡 Install & Requirements
🛠 SetUp
📚 DeepFake Video BenchMark Datasets — Overview of Celeb-DF-v2, FF++, and KoDF datasets used for training.
⚙️ Data Preparation — Efficient face detection and landmark extraction pipeline using YOLOv8
🏗 Model Architecture — Detailed look into our hybrid CNN-ViT (MS-EffViT & MS-EffGCViT) designs.
🧬 Model Zoo — Comparison of model variants, parameter counts, and computational complexity (FLOPs).
🚀 Training - Step-by-step training scrips with Goolge Colab and W&B experiment tracking
📈 Model Evaluation - Benchmarking results
💻 Model Usage - How to integrate DeepGuard models into your own Python code or via timm
🔮 Predict Image & Video - Simple Inference examples for detecting deepfakes in image and video
📬 Authors
📝 Reference
⚖️ License

💡 Install & Requirements

To install requirements:

pip install -r requirements.txt

🛠 SetUp

Clone the repository and move into it:

git clone https://github.com/HanMoonSub/DeepGuard.git

cd DeepGuard

📚 DeepFake Video BenchMark Datasets

To evaluate the generalization and robustness of our deepfake detection model, we utilize three large-scale, widely recognized benchmark datasets. Each dataset presents unique challenges and covers different types of forgery methods.

Dataset	Real Videos	Fake Videos	Year	Participants	Description (Paper Title)	Details
Celeb-DF-v2	890	5,639	2019	59	A Large-scale Challenging Dataset for DeepFake Forensics	🔗 Readme
FaceForensics++	1,000	6,000	2019	1,000	Learning to Detect Manipulated Facial Images	🔗 Readme
KoDF	62,166	175,776	2020	400	Large-Scale Korean Deepfake Detection Dataset	🔗 Readme

⚙️ Data Preparation

Our preprocessing pipeline is designed to efficiently extract facial features from videos and prepare them for high-accuracy deepfake detection.

Detect Original Face

To maximize preprocessing efficiency, face detection is performed only on original (real) videos. Since mnipulated videos in DeepFake Video BenchMark Datasets share the same spatial coordinates as their sources, these bounding boxes are reused for the corresponding deepfake versions.

🚀 Efficiency Optimizations

Lightweight Model: Uses yolov8n-face for high-speed inference without sacrificing accuracy.
Targeted Processing: By detecting faces only in original videos, the total detection workload is reduced by approximately 80%.
Dynamic Rescaling: To maintain consistent inference speed across different resolutions, frames are automatically resized based on their dimensions:

Frame Size(Longest Side)	Scale Factor	Action
< 300px	2.0
300px - 700px	1.0
700px - 1500px	0.5
> 1500px	0.33

Face Cropping & Landmark Extraction

This module extracts face crops from both original and deepfake videos using the bounding boxes generated in the previous step. It also performs landmark detection to facilitate advanced augmentations like Landmark-based Cutout

🛠 Key Features

Dynamic Margin with Jitter: Adds a configurable margin around the face. The margin_jitter parameter introduces random variance to the crop size, making the model more robust to different face scales.
Landmark Localization: Detects 5 primary facial landmarks (eyes, nose, mouth corners) and saves them as .npy files.

DATA_ROOT/
├── crops/
│   └── {video_id}/
│       ├── 12.png
│       └── ...
├── landmarks/
│   └── {video_id}/
│       ├── 12.npy
│       └── ...
└── train_frame_metadata.csv

Dataset-Specific Pipelines

Click the links below to view the specific preprocessing details for each dataset:

🏗 Model Architecture

Multi Scale Efficient Global Context Vision Transformer is an optimized multi-scale hybrid architecture that integrates CNN-driven spatial inductive bias with hierarchical attention mechanisms to effectively identify subtle(local) artifacts and macro(global) artifacts for robust deepfake forensics."

Explore More Details

Model Architecture: MS-EffViT - Multi Scale Efficient Vision Transformer
Advanced Architecture: MS-EFFGCViT - Multi Scale Efficient Global Context Vision Transformer

We utilizes two distinct types of self-attention to capture both long-range and short-range information across feature maps.

Local Window Attention: this model efficiently captures local textures and precise spatial details while maintaining linear computational complexity relative to the image size.
Global Window Attention: Unlike Swin Transformer, this module utilizes global-queries that interact with local window keys and values. This allows each local region to incorporate global context, effectively capturing long-range dependencies and providing a comprehensive understanding of the entire spatial structure

🧬 Model Zoo

Model	Resolution	# Total Params(M)	# Backbone(M)	# L-ViT(M)	# H-ViT(M)	FLOPs (G)	Model Config
⚡ ms_eff_gcvit_b0	224 X 224	8.7	3.6(41.4%)	1.7(19.5%)	3.3(37.9%)	0.87	spec
🔥 ms_eff_gcvit_b5	384 X 384	50.3	27.3(54.3%)	6.6(13.1%)	16.1(32.0%)	13.64	spec

🚀 Training

We provide training scripts for both ms_eff_vit and ms_eff_gcvit. We recommend using Google Colab for free GPU access and Weightes & Biases(W&B) for experiment tracking

📊 Weight & Biases Experiments

ms_eff_vit_b0: Celeb-DF-v2 🚀 | FaceForensics++ 🚀 | KoDF 🚀
ms_eff_vit_b5: Celeb-DF-v2 🚀 | FaceForensics++ 🚀 | KoDF 🚀
ms_eff_gcvit_b0: Celeb-DF-v2 🚀 | FaceForensics++ 🚀 | KoDF 🚀
ms_eff_gcvit_b5: Celeb-DF-v2 🚀 | FaceForensics++ 🚀 | KoDF 🚀

!python -m train_eff_vit \ # train_eff_gcvit
    --root-dir DATA_ROOT \ 
    --model-ver "ms_eff_vit_b5" \ # ms_eff_vit_b0, ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5
    --dataset "ff++" \ # ff++, celeb_df_v2, kodf
    --seed 2025 \ # for reproducibility
    --wandb-api-key "your-api-key" # Write your own api key

📈 Model Evaluation

!python -m inference.predict_video \
    --root-dir DATA_ROOT \
    --margin-ratio 0.2 \
    --conf-thres 0.5 \
    --min-face-ratio 0.01 \
    --model-name "ms_eff_gcvit_b0" \ # ms_eff_vit_b0, ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5
    --model-dataset "kodf" \ # ff++, celeb_df_v2, kodf
    --num-frames 20 \
    --tta-hflip 0.0 \
    --agg-mode "conf" \

Celeb DF(v2) Pretrained Models

Model Variant	Test@Acc	Test@Auc	Test@log_loss	Download	Train Config
ms_eff_gcvit_b0	0.9842	0.9965	0.0283	model	recipe
ms_eff_gcvit_b5	0.9981	0.9984	0.0089	model	recipe

FaceForensics++ Pretrained Models

Model Variant	Test@Acc	Test@Auc	Test@log_loss	Download	Train Config
ms_eff_gcvit_b0	0.9808	0.9969	0.0637	model	recipe
ms_eff_gcvit_b5	0.9850	0.9974	0.0492	model	recipe

KoDF Pretrained Models

Model Variant	Test@Acc	Test@Auc	Test@log_loss	Download	Train Config
ms_eff_gcvit_b0	0.9655	0.9792	0.1237	model	recipe
ms_eff_gcvit_b5	0.9850	0.9974	0.0492	model	recipe

💻 Model Usage

Quick Start You can load the models directly via the DeepGuard package or through the timm interface.

Available Datasets: celeb_df_v2, ff++, kodf

Installation

pip install -U git+https://github.com/HanMoonSub/DeepGuard.git

Option A: Direct Import (via DeepGuard)

from deepguard import ms_eff_gcvit_b0, ms_eff_gcvit_b5

model = ms_eff_gcvit_b0(pretrained=True, dataset="celeb_df_v2")
model = ms_eff_gcvit_b5(pretrained=True, dataset="ff++")

Option B: Using timm Interface (via timm)

import timm
import deepguard

model = timm.create_model("ms_eff_gcvit_b0", pretrained=True, dataset="ff++")
model = timm.create_model("ms_eff_gcvit_b5", pretrained=True, dataset="kodf")

🔮 Predict Image & Video

Predict DeepFake Image

from inference import ImagePredictor

# Initialize the predictor
predictor = ImagePredictor(
            margin_ratio = 0.2, #  Margin ratio around the detected face crop
            conf_thres = 0.5, # Confidence threshold for face detection
            min_face_ratio = 0.01, # Minimum face-toframe size ratio to process 
            model_name = "ms_eff_vit_b0", #  ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5  
            dataset = "celeb_df_v2" # ff++, kodf
            )

# Run Inference
result = predictor.predict_img(
            img_path="path/to/image.jpg",
            tta_hflip=0.0 # Horizontal Flip for Test-Time Augmentation 
            )

print(f"Deepfake Probability: {result:.4f}")

Predict DeepFake Video

from inference import VideoPredictor

# Initialize the predictor
predictor = VideoPredictor(
            margin_ratio = 0.2, #  Margin ratio around the detected face crop
            conf_thres = 0.5, # Confidence threshold for face detection
            min_face_ratio = 0.01, # Minimum face-toframe size ratio to process 
            model_name = "ms_eff_vit_b0", #  ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5  
            dataset = "celeb_df_v2" # ff++, kodf
            )

# Run Inference
result = predictor.predict_video(
            video_path = "path/to/video.mp4",
            num_frames = 20, # Number of frames to sample per video
            agg_mode = "conf", # Aggregation Method: 'conf', 'mean', 'vote'
            tta_hflip=0.0 # Horizontal Flip for Test-Time Augmentation 
            )

print(f"Deepfake Probability: {result:.4f}")

📬 Authors

This project was developed as a Senior Graduation Project by the Department of Software at Chungbuk National University (CBNU), Republic of Korea.

한문섭: Data & Backend Engineering (Data Preprocessing Pipeline, DB Schema Design) — hanmoon3054@gmail.com
이예솔: UI/UX & Frontend Engineering (UI/UX Design, User Dashboard, Model Visualization) — yesol4138@chungbuk.ac.kr
서윤제: AI Engineering (AI Model Architecture, Inference API Design, Model Serving) — seoyunje2001@gmail.com

📝 Reference

facenet-pytorch - Pretrained Face Detection(MTCNN) and Recognition(InceptionResNet) Models by Tim Esler
face-cutout - Face Cutout Library by Sowmen
Celeb-DF++ - Celeb-DF++ Dataset by OUC-VAS Group
DeeperForensics-1.0 - DeeperForensics-1.0 Dataset by Endless Sora
Deepfake Detection - Detection of Video Deepfake using ResNext and LSTM by Abhijith Jadhav
deepfake-detection-project-v4 - Multiple Deep Learning Models by Ameen Caslam
Awesome-Deepfake-Detection - A curated list of tools, papers and code by Daisy Zhang

⚖️ License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github		.github
Attention		Attention
Images		Images
Videos		Videos
deepguard		deepguard
inference		inference
preprocess		preprocess
LICENSE		LICENSE
README.md		README.md
README_KR.md		README_KR.md
labels.txt		labels.txt
requirements.txt		requirements.txt
setup.py		setup.py
train_eff_gcvit.py		train_eff_gcvit.py
train_eff_vit.py		train_eff_vit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deepfakes Detection

📌 Contents

💡 Install & Requirements

🛠 SetUp

📚 DeepFake Video BenchMark Datasets

⚙️ Data Preparation

Detect Original Face

Face Cropping & Landmark Extraction

Dataset-Specific Pipelines

🏗 Model Architecture

Explore More Details

🧬 Model Zoo

🚀 Training

📊 Weight & Biases Experiments

📈 Model Evaluation

💻 Model Usage

🔮 Predict Image & Video

Predict DeepFake Image

Predict DeepFake Video

📬 Authors

📝 Reference

⚖️ License

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Deepfakes Detection

📌 Contents

💡 Install & Requirements

🛠 SetUp

📚 DeepFake Video BenchMark Datasets

⚙️ Data Preparation

Detect Original Face

Face Cropping & Landmark Extraction

Dataset-Specific Pipelines

🏗 Model Architecture

Explore More Details

🧬 Model Zoo

🚀 Training

📊 Weight & Biases Experiments

📈 Model Evaluation

💻 Model Usage

🔮 Predict Image & Video

Predict DeepFake Image

Predict DeepFake Video

📬 Authors

📝 Reference

⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages