TRUST Content Moderation Classifier

TRUST is a Vision Transformer (ViT)-based image classification model designed for automated content moderation and media safety applications. Unlike traditional binary classifiers, TRUST provides probability distributions across multiple content categories, enabling flexible moderation workflows and threshold-based decision making.

The model is trained to classify images into five moderation categories and is suitable for content filtering, review systems, and safety-focused applications.

Demo

🤗 Hugging Face Demo

Overview

Modern content moderation systems often require more nuanced decisions than simple safe/unsafe classification. TRUST addresses this challenge by producing probability scores across multiple content categories, allowing developers to define custom moderation thresholds and workflows.

The model was trained on the deepghs/nsfw_detect dataset and generates probability distributions that can be used for automated filtering, content restriction, or manual review pipelines.

Model Summary

The classifier is built on the Vision Transformer (ViT) architecture, leveraging self-attention mechanisms to capture global image context and improve classification performance.

Feature	Details
Base Model	`google/vit-base-patch16-224-in21k`
Input Resolution	224 × 224
Output Type	Softmax Probability Distribution
Categories	5
Framework	PyTorch, Hugging Face Transformers

Content Categories

Neutral — Safe-for-work everyday imagery
Drawing — Non-explicit artwork, illustrations, comics, and anime
Suggestive — Non-explicit suggestive content
Explicit — Explicit real-world adult content
Illustrated Explicit — Explicit illustrated or animated content

Evaluation Results

Metrics reported on the validation dataset.

Metric	Score
Accuracy	93.8%
Precision (Weighted)	98.5%
F1 Score	0.93

Confusion Matrix

The model was optimized to reduce false negatives in explicit-content categories while maintaining strong overall classification performance.

Model

The deployed model and inference API are available on Hugging Face Spaces:

👉 https://huggingface.co/spaces/Sagnikroy/TRUST_NSFW_Detection

Technical Stack

PyTorch
Hugging Face Transformers
Vision Transformer (ViT)
FastAPI
Hugging Face Spaces

Key Features

Multi-class content classification
Vision Transformer-based architecture
Probability-based moderation outputs
Real-time inference API
Cloud deployment through Hugging Face Spaces
Configurable moderation thresholds

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
benchmarks		benchmarks
.gitignore		.gitignore
README.md		README.md
benchmarks.py		benchmarks.py
classifier.py		classifier.py
confusion_matrix_professional.png		confusion_matrix_professional.png
dload.py		dload.py
dprep.py		dprep.py
fp.py		fp.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRUST Content Moderation Classifier

Demo

Table of Contents

Overview

Model Summary

Content Categories

Evaluation Results

Confusion Matrix

Model

Technical Stack

Key Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TRUST Content Moderation Classifier

Demo

Table of Contents

Overview

Model Summary

Content Categories

Evaluation Results

Confusion Matrix

Model

Technical Stack

Key Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages