Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 39 additions & 18 deletions deepguard/MS_EffGCViT.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ This Repository presents the PyTorch implementation of **Multi Scale Efficient G

This model is a **frame-level** and **spatial-domain** architecture, designed to perform classification tasks on both **static images** and **video sequences**

<img src="../docs/architectures/dual_branch.gif" width="900">

<img src="../docs/benchmarks/celeb_df_v2_gcvit.png" width="900">

## 💥 News 💥

Expand All @@ -27,21 +28,49 @@ This model is a **frame-level** and **spatial-domain** architecture, designed to

## Model Performance

MS_Eff_GCViT achieves state-of-the-art(SOTA) results across deepfake video classification. On Celeb_DF(v2) dataset, MS_EFF_GCViT variants with `8.7M`, `50.3M` parameters achieve `0.9842`, `0.9981` Accuracy. Notably, the MS_EFF_GCViT_B0 variant demonstrates exceptional efficiency, matching or exceeding SOTA performance even with a siginificantly lower parameter
**MS-EFF-GCViT achieves state-of-the-art (SOTA) results across three DeepFake benchmarks.**
The model ships in two variants from a single architecture — **Fast (b0)** for real-time / edge
deployment and **Pro (b5)** for enterprise-grade accuracy. Notably, **Fast** matches or exceeds
much larger SOTA models while using a fraction of the parameters and compute.

<p align="center">
<img src="../docs/benchmarks/gcvit_summary_bars.png" width="100%">
</p>

### Test Result of Celeb_DF(v2)
> On **Celeb-DF(v2)**, Pro reaches **0.9981 Acc** (rank #1) and Fast **0.9842** (rank #3) among 20 architectures.
> On the **KoDF competition** leaderboard, Pro ranks **#1** and Fast **#4** out of 49 entries.

<img src="../docs/benchmarks/celeb_df_v2_gcvit.png" width="900">
<details>
<summary><b>📊 Celeb-DF (v2) — Accuracy & Efficiency</b></summary>
<br>
<img src="../docs/benchmarks/celeb_df_v2_gcvit_2.png" width="100%">

| Variant | Test@Acc | Test@AUC | Test@LogLoss |
| :------ | :------: | :------: | :----------: |
| ms_eff_gcvit_b0 | 0.9842 | 0.9965 | 0.0283 |
| ms_eff_gcvit_b5 | 0.9981 | 0.9984 | 0.0089 |
</details>

<details>
<summary><span style="font-size: 1.25em; font-weight: bold;">Test Result of FaceForensics++</span></summary>
<img src="../docs/benchmarks/ff_gcvit.png" width="900">
<summary><b>📊 FaceForensics++ — Accuracy & Efficiency</b></summary>
<br>
<img src="../docs/benchmarks/ff_gcvit.png" width="100%">

| Variant | Test@Acc | Test@AUC | Test@LogLoss |
| :------ | :------: | :------: | :----------: |
| ms_eff_gcvit_b0 | 0.9808 | 0.9969 | 0.0637 |
| ms_eff_gcvit_b5 | 0.9850 | 0.9974 | 0.0492 |
</details>

<details>
<summary><span style="font-size: 1.25em; font-weight: bold;">Test Result of KoDF</span></summary>
<img src="../docs/benchmarks/kodf_gcvit.png" width="900">
<summary><b>📊 KoDF Competition — Accuracy Ranking</b></summary>
<br>
<img src="../docs/benchmarks/kodf_gcvit.png" width="100%">

| Variant | Test@Acc | Test@AUC | Test@LogLoss |
| :------ | :------: | :------: | :----------: |
| ms_eff_gcvit_b0 | 0.9655 | 0.9792 | 0.1237 |
| ms_eff_gcvit_b5 | 0.9792 | 0.9974 | 0.0492 |
</details>

## Model Indroduction
Expand Down Expand Up @@ -77,6 +106,8 @@ While both **Xception** and **EfficientNet** show great results on DeepFake benc

### Part 4: Multi-Scale Feature Map Fusion

<img src="../docs/architectures/dual_branch.gif" width="900">

Modern DeepFakes can leave very localized forgery region. To Capture this, we adopts a **multi-scale strategy** by extracting features from different levels of the backbone.

- **![](https://img.shields.io/badge/Low_level_Branch-blue?style=flat-square) (_Subtle Artifacts_)**: High-Resolution feature maps are extracted from early backbone blocks(`l_block_idx`) to capture like skin texture or boundary artifacts
Expand Down Expand Up @@ -177,16 +208,6 @@ model = timm.create_model("ms_eff_gcvit_b5", pretrained=True, dataset="kodf")

## 📊 Visual Results

<p align="center">
<table>
<tr>
<td><img src="../docs/architectures/low_branch.gif" width="100%"></td>
<td width="20%"></td>
<td><img src="../docs/architectures/high_branch.gif" width="100%"></td>
</tr>
</table>
</p>

### MS-EFF-GCVIT — Low-Level Branch

| Model | Branch-Level | Image | HiresCam | GradCamElementwise | LayerCam |
Expand Down
Binary file modified docs/benchmarks/celeb_df_v2_gcvit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/benchmarks/celeb_df_v2_gcvit_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/benchmarks/ff_gcvit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/benchmarks/gcvit_summary_bars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/benchmarks/kodf_gcvit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading