Skip to content

Fatemeh-ameri/ood-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OOD Detection Experiments on CIFAR-10

This project explores simple out-of-distribution (OOD) detection experiments on CIFAR-10 using PyTorch.

The work starts with baseline image classification models and then moves toward known-vs-unknown class evaluation using confidence-based OOD scores.

What This Project Includes

  • Loading and visualizing CIFAR-10 images
  • Training baseline image classification models
  • Saving and loading model checkpoints
  • Measuring prediction confidence with softmax
  • Comparing confidence for correct and incorrect predictions
  • Training models only on selected known classes
  • Treating the remaining CIFAR-10 classes as unknown during evaluation
  • Testing Maximum Softmax Probability (MSP) as a simple OOD baseline
  • Comparing OOD scores such as MSP, Energy, Max Logit, and Logit Margin
  • Tracking experiments in experiment_log.md

Current Results

Model Training Setup Test Accuracy
Fully Connected Neural Network CIFAR-10, all classes 43.36%
Simple CNN CIFAR-10, all classes 64.05%
ResNet18 CIFAR-10, all classes, normalization 76.47%
ResNet18 CIFAR-10, augmentation, Adam, StepLR, 20 epochs 83.07%
Simple CNN Known classes only 61.02%
ResNet18 Known classes only, improved training recipe 79.70%

Confidence Analysis

For the Simple CNN trained on all CIFAR-10 classes:

Prediction Type Average Confidence
Correct predictions 0.739
Wrong predictions 0.530

The model is usually more confident when predictions are correct, but there is still overlap between correct and incorrect predictions.

Known vs Unknown Experiment

For the OOD-style experiment, the models were trained only on six CIFAR-10 animal classes:

  • bird
  • cat
  • deer
  • dog
  • frog
  • horse

The remaining vehicle classes were treated as unknown during evaluation:

  • airplane
  • automobile
  • ship
  • truck

Average softmax confidence:

Model Known Confidence Unknown Confidence
Simple CNN 0.631 0.552
ResNet18, earlier setup 0.861 0.753
ResNet18, improved setup 0.829 0.661

The improved ResNet18 model increased known-class accuracy and reduced average confidence on unknown samples, although confidence overlap still remains.

ResNet18 known vs unknown confidence

MSP Thresholding

Maximum Softmax Probability (MSP) was used as a simple baseline for unknown detection.

A sample is treated as unknown when its maximum softmax confidence is below a selected threshold.

At threshold 0.8:

Model Known Accepted Unknown Rejected
Simple CNN 25.47% 88.05%
ResNet18, earlier setup 71.37% 51.88%
ResNet18, improved setup 65.35% 69.45%

This shows the trade-off between accepting known samples and rejecting unknown samples. The improved ResNet18 setup rejects more unknown samples than the earlier ResNet18 setup, but still does not fully separate known and unknown samples using MSP alone.

Unknown rejection threshold comparison

OOD Score Comparison

AUROC was calculated using several OOD scores on the improved known-only ResNet18 model.

Known samples were labeled as 1, and unknown samples were labeled as 0.

OOD Score AUROC
MSP confidence 0.7427
Energy score, T=1 0.7917
Energy score, T=2 0.7940
Max logit 0.7862
Logit margin 0.7195

Energy-based scoring gave the best AUROC in this setup, with a small improvement from using temperature T=2.

MSP vs Energy ROC curve

FPR@95TPR

FPR@95TPR was calculated to evaluate unknown detection when the known-class true positive rate is around 95%.

OOD Score FPR@95TPR
MSP confidence 0.8315
Energy score, T=2 0.7745

Energy reduced the false positive rate compared with MSP, but both scores still accepted many unknown samples as known.

Unknown Class Predictions

The Simple CNN trained only on known animal classes often maps unknown vehicle classes into known animal classes.

Examples from the confusion matrix:

  • airplane → bird
  • ship → bird
  • truck → horse

This suggests that the model assigns unknown samples to the closest known classes instead of recognizing them as unseen.

Unknown confusion matrix

Observations

  • ResNet18 performs better than the Simple CNN on known-class classification.
  • Better classification accuracy does not automatically solve overconfidence on unknown samples.
  • Data augmentation, longer training, and a learning rate scheduler improved ResNet18 accuracy.
  • The improved ResNet18 setup reduced average unknown confidence compared with the earlier ResNet18 setup.
  • MSP thresholding shows a clear trade-off between accepting known samples and rejecting unknown samples.
  • Energy-based scoring performed better than MSP in this setup, but the separation between known and unknown samples is still not complete.

How to Run

Clone the repository:

git clone https://github.com/Fatemeh-ameri/ood-project.git
cd ood-project

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Run the experiment scripts from the src folder.

Note: The exact scripts may change as the project is refactored. The experiment results are also summarized in experiment_log.md.

About

Small OOD detection experiments on CIFAR-10 using MSP, Energy score, and related confidence-based methods.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages