Skip to content

HanMoonSub/DeepGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

206 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Deepfakes Detection

DeepGuard Banner

License Stars Status Release

Task FF++ Celeb-DF KODF

Models Python PyTorch W&B

๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด ๋ฒ„์ „ | ๐Ÿ“ˆ Model Evaluation | ๐Ÿ”ฎ Try Demo

๐Ÿ“Œ Contents

๐Ÿ’ก Install & Requirements

To install requirements:

pip install -r requirements.txt

๐Ÿ›  SetUp

Clone the repository and move into it:

git clone https://github.com/HanMoonSub/DeepGuard.git

cd DeepGuard

๐Ÿ“š DeepFake Video BenchMark Datasets

To evaluate the generalization and robustness of our deepfake detection model, we utilize three large-scale, widely recognized benchmark datasets. Each dataset presents unique challenges and covers different types of forgery methods.

Dataset Real Videos Fake Videos Year Participants Description (Paper Title) Details
Celeb-DF-v2 890 5,639 2019 59 A Large-scale Challenging Dataset for DeepFake Forensics ๐Ÿ”— Readme
FaceForensics++ 1,000 6,000 2019 1,000 Learning to Detect Manipulated Facial Images ๐Ÿ”— Readme
KoDF 62,166 175,776 2020 400 Large-Scale Korean Deepfake Detection Dataset ๐Ÿ”— Readme

โš™๏ธ Data Preparation

Our preprocessing pipeline is designed to efficiently extract facial features from videos and prepare them for high-accuracy deepfake detection.

Detect Original Face

To maximize preprocessing efficiency, face detection is performed only on original (real) videos. Since mnipulated videos in DeepFake Video BenchMark Datasets share the same spatial coordinates as their sources, these bounding boxes are reused for the corresponding deepfake versions.

๐Ÿš€ Efficiency Optimizations

  • Lightweight Model: Uses yolov8n-face for high-speed inference without sacrificing accuracy.

  • Targeted Processing: By detecting faces only in original videos, the total detection workload is reduced by approximately 80%.

  • Dynamic Rescaling: To maintain consistent inference speed across different resolutions, frames are automatically resized based on their dimensions:

Frame Size(Longest Side) Scale Factor Action
< 300px 2.0
300px - 700px 1.0
700px - 1500px 0.5
> 1500px 0.33

Face Cropping & Landmark Extraction

This module extracts face crops from both original and deepfake videos using the bounding boxes generated in the previous step. It also performs landmark detection to facilitate advanced augmentations like Landmark-based Cutout

๐Ÿ›  Key Features

  • Dynamic Margin with Jitter: Adds a configurable margin around the face. The margin_jitter parameter introduces random variance to the crop size, making the model more robust to different face scales.

  • Landmark Localization: Detects 5 primary facial landmarks (eyes, nose, mouth corners) and saves them as .npy files.

DATA_ROOT/
โ”œโ”€โ”€ crops/
โ”‚   โ””โ”€โ”€ {video_id}/
โ”‚       โ”œโ”€โ”€ 12.png
โ”‚       โ””โ”€โ”€ ...
โ”œโ”€โ”€ landmarks/
โ”‚   โ””โ”€โ”€ {video_id}/
โ”‚       โ”œโ”€โ”€ 12.npy
โ”‚       โ””โ”€โ”€ ...
โ””โ”€โ”€ train_frame_metadata.csv

Dataset-Specific Pipelines

Click the links below to view the specific preprocessing details for each dataset:

๐Ÿ— Model Architecture

Multi Scale Efficient Global Context Vision Transformer is an optimized multi-scale hybrid architecture that integrates CNN-driven spatial inductive bias with hierarchical attention mechanisms to effectively identify subtle(local) artifacts and macro(global) artifacts for robust deepfake forensics."

Explore More Details

We utilizes two distinct types of self-attention to capture both long-range and short-range information across feature maps.

  • Local Window Attention: this model efficiently captures local textures and precise spatial details while maintaining linear computational complexity relative to the image size.

  • Global Window Attention: Unlike Swin Transformer, this module utilizes global-queries that interact with local window keys and values. This allows each local region to incorporate global context, effectively capturing long-range dependencies and providing a comprehensive understanding of the entire spatial structure

๐Ÿงฌ Model Zoo

Model Resolution # Total Params(M) # Backbone(M) # L-ViT(M) # H-ViT(M) FLOPs (G) Model Config
โšก ms_eff_gcvit_b0 224 X 224 8.7 3.6(41.4%) 1.7(19.5%) 3.3(37.9%) 0.87 spec
๐Ÿ”ฅ ms_eff_gcvit_b5 384 X 384 50.3 27.3(54.3%) 6.6(13.1%) 16.1(32.0%) 13.64 spec

๐Ÿš€ Training

We provide training scripts for both ms_eff_vit and ms_eff_gcvit. We recommend using Google Colab for free GPU access and Weightes & Biases(W&B) for experiment tracking

๐Ÿ“Š Weight & Biases Experiments

!python -m train_eff_vit \ # train_eff_gcvit
    --root-dir DATA_ROOT \ 
    --model-ver "ms_eff_vit_b5" \ # ms_eff_vit_b0, ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5
    --dataset "ff++" \ # ff++, celeb_df_v2, kodf
    --seed 2025 \ # for reproducibility
    --wandb-api-key "your-api-key" # Write your own api key

๐Ÿ“ˆ Model Evaluation

!python -m inference.predict_video \
    --root-dir DATA_ROOT \
    --margin-ratio 0.2 \
    --conf-thres 0.5 \
    --min-face-ratio 0.01 \
    --model-name "ms_eff_gcvit_b0" \ # ms_eff_vit_b0, ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5
    --model-dataset "kodf" \ # ff++, celeb_df_v2, kodf
    --num-frames 20 \
    --tta-hflip 0.0 \
    --agg-mode "conf" \

Celeb DF(v2) Pretrained Models

Model Variant Test@Acc Test@Auc Test@log_loss Download Train Config
ms_eff_gcvit_b0 0.9842 0.9965 0.0283 model recipe
ms_eff_gcvit_b5 0.9981 0.9984 0.0089 model recipe

FaceForensics++ Pretrained Models

Model Variant Test@Acc Test@Auc Test@log_loss Download Train Config
ms_eff_gcvit_b0 0.9808 0.9969 0.0637 model recipe
ms_eff_gcvit_b5 0.9850 0.9974 0.0492 model recipe

KoDF Pretrained Models

Model Variant Test@Acc Test@Auc Test@log_loss Download Train Config
ms_eff_gcvit_b0 0.9655 0.9792 0.1237 model recipe
ms_eff_gcvit_b5 0.9850 0.9974 0.0492 model recipe

๐Ÿ’ป Model Usage

Quick Start You can load the models directly via the DeepGuard package or through the timm interface.

Available Datasets: celeb_df_v2, ff++, kodf

Installation

pip install -U git+https://github.com/HanMoonSub/DeepGuard.git

Option A: Direct Import (via DeepGuard)

from deepguard import ms_eff_gcvit_b0, ms_eff_gcvit_b5

model = ms_eff_gcvit_b0(pretrained=True, dataset="celeb_df_v2")
model = ms_eff_gcvit_b5(pretrained=True, dataset="ff++")

Option B: Using timm Interface (via timm)

import timm
import deepguard

model = timm.create_model("ms_eff_gcvit_b0", pretrained=True, dataset="ff++")
model = timm.create_model("ms_eff_gcvit_b5", pretrained=True, dataset="kodf")

๐Ÿ”ฎ Predict Image & Video

Predict DeepFake Image

from inference import ImagePredictor

# Initialize the predictor
predictor = ImagePredictor(
            margin_ratio = 0.2, #  Margin ratio around the detected face crop
            conf_thres = 0.5, # Confidence threshold for face detection
            min_face_ratio = 0.01, # Minimum face-toframe size ratio to process 
            model_name = "ms_eff_vit_b0", #  ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5  
            dataset = "celeb_df_v2" # ff++, kodf
            )

# Run Inference
result = predictor.predict_img(
            img_path="path/to/image.jpg",
            tta_hflip=0.0 # Horizontal Flip for Test-Time Augmentation 
            )

print(f"Deepfake Probability: {result:.4f}")

Predict DeepFake Video

from inference import VideoPredictor

# Initialize the predictor
predictor = VideoPredictor(
            margin_ratio = 0.2, #  Margin ratio around the detected face crop
            conf_thres = 0.5, # Confidence threshold for face detection
            min_face_ratio = 0.01, # Minimum face-toframe size ratio to process 
            model_name = "ms_eff_vit_b0", #  ms_eff_vit_b5, ms_eff_gcvit_b0, ms_eff_gcvit_b5  
            dataset = "celeb_df_v2" # ff++, kodf
            )

# Run Inference
result = predictor.predict_video(
            video_path = "path/to/video.mp4",
            num_frames = 20, # Number of frames to sample per video
            agg_mode = "conf", # Aggregation Method: 'conf', 'mean', 'vote'
            tta_hflip=0.0 # Horizontal Flip for Test-Time Augmentation 
            )

print(f"Deepfake Probability: {result:.4f}")

๐Ÿ“ฌ Authors

This project was developed as a Senior Graduation Project by the Department of Software at Chungbuk National University (CBNU), Republic of Korea.

  • ํ•œ๋ฌธ์„ญ: Data & Backend Engineering (Data Preprocessing Pipeline, DB Schema Design) โ€” hanmoon3054@gmail.com
  • ์ด์˜ˆ์†”: UI/UX & Frontend Engineering (UI/UX Design, User Dashboard, Model Visualization) โ€” yesol4138@chungbuk.ac.kr
  • ์„œ์œค์ œ: AI Engineering (AI Model Architecture, Inference API Design, Model Serving) โ€” seoyunje2001@gmail.com

๐Ÿ“ Reference

  1. facenet-pytorch - Pretrained Face Detection(MTCNN) and Recognition(InceptionResNet) Models by Tim Esler
  2. face-cutout - Face Cutout Library by Sowmen
  3. Celeb-DF++ - Celeb-DF++ Dataset by OUC-VAS Group
  4. DeeperForensics-1.0 - DeeperForensics-1.0 Dataset by Endless Sora
  5. Deepfake Detection - Detection of Video Deepfake using ResNext and LSTM by Abhijith Jadhav
  6. deepfake-detection-project-v4 - Multiple Deep Learning Models by Ameen Caslam
  7. Awesome-Deepfake-Detection - A curated list of tools, papers and code by Daisy Zhang

โš–๏ธ License

This project is licensed under the terms of the MIT license.

About

DeepGuard (Virtuose)

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages