Skip to content

frangente/SITH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SITH: Semantic Inspection of Transformer Heads

From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

CVPR 2026 arXiv

Python PyTorch License: MIT

Francesco Gentile1, Nicola Dall'Asen1,2, Francesco Tonini1,3, Massimiliano Mancini1, Lorenzo Vaquero3, Elisa Ricci1,3

1University of Trento   2University of Pisa   3Fondazione Bruno Kessler


Overview of SITH and COMP

Overview

SITH is a data-free, training-free, weight-based interpretability framework for CLIP's vision transformer. For each attention head, it decomposes the Value-Output (VO) weight matrix via Singular Value Decomposition (SVD), revealing the head's dominant computational directions. Each singular vector is then interpreted using COMP (Coherent Orthogonal Matching Pursuit), a novel sparse decomposition algorithm that explains it as a non-negative combination of human-interpretable textual concepts, optimizing for both reconstruction fidelity and semantic coherence.

Key Features

  • Data-free — analysis is performed directly on model weights, with no dataset or activations needed.
  • Training-free — no optimization or gradient computation required.
  • Fine-grained — provides intra-head explanations at the individual singular vector level.
  • Actionable — enables precise model edits (suppressing spurious correlations, removing unsafe concepts, improving classification) without retraining.

Table of Contents

Installation

SITH can be installed as a standalone library if you are interested in the core components (CLIP model utilities, sparse dictionary learning algorithms, etc.) without reproducing the paper's experiments:

uv pip install git+https://github.com/frangente/SITH.git

To reproduce the paper's experiments, clone the repository and install the dependencies:

git clone https://github.com/frangente/SITH.git
cd SITH
uv sync --frozen                       # core dependencies
uv sync --group scripts --frozen       # + dependencies for data preparation scripts
uv sync --group experiments --frozen   # + dependencies for experiments

Data

All scripts and experiments expect intermediate data under data/ in the repository root. Pre-computed resources for the experiments in the paper are available on Google Drive and can be downloaded with:

uv run python scripts/download_model_data.py \
    --model-name ViT-L-14 \
    --pretrained laion2b_s32b_b82k

The expected layout is:

data/
├── dictionaries/                          # Textual concept pools
│   ├── conceptnet.txt
│   ├── wordnet.txt
│   └── ...
└── models/
    └── ViT-L-14/                          # Model architecture
        └── laion2b_s32b_b82k/             # Pretrained weights
            ├── image_mean.pt              # Mean image embedding (over CC12M)
            ├── dictionaries/              # Encoded dictionaries
            │   ├── conceptnet.pt
            │   └── ...
            └── decompositions/            # SITH decompositions
                ├── layer-20_right_foldln_conceptnet_comp-0.3_sparsity-5.pt
                ├── layer-20_left_nofoldln_conceptnet_omp_sparsity-10.pt
                └── ...

Decomposition filenames encode the key parameters for quick identification:

layer-{layer}_{left|right}_{foldln|nofoldln}_{dictionary}_{method}_sparsity-{K}.pt

Each .pt file contains a dict with scores and indices tensors, representing the sparse decomposition of the specified singular vectors for all heads in the given layer. Each tensor has shape (num_heads, rank, K), where rank is the number of singular vectors decomposed per head and K is the sparsity level (number of concepts selected per vector).

Scripts

If you want to recompute the resources from scratch (e.g., for a different model or dictionary), the scripts under scripts/ cover the full pipeline.

1. Download the Concept Dictionary

Download one of the available textual concept pools (e.g., ConceptNet 5.5):

uv run python scripts/download_dictionary.py --dictionary conceptnet

Available dictionaries: conceptnet, efros, laion, wordnet.

2. Encode the Dictionary

Encode the textual dictionary into CLIP embeddings for a specific model:

uv run python scripts/encode_dictionary.py \
    --model-name ViT-L-14 \
    --pretrained laion2b_s32b_b82k \
    --dictionary conceptnet

3. Compute the Image Embedding Mean

Compute the mean image embedding over CC12M to mitigate the multimodality gap of CLIP:

uv run python scripts/compute_image_mean.py \
    --model-name ViT-L-14 \
    --pretrained laion2b_s32b_b82k

4. Decompose Attention Heads

Run SITH to decompose VO matrices into interpretable singular vectors:

uv run python scripts/decompose.py \
    --model-name ViT-L-14 \
    --pretrained laion2b_s32b_b82k \
    --layers 20 21 22 23 \
    --rank 64 \
    --method comp-0.3 \
    --sparsity 5 \
    --dictionary conceptnet \
    --sv-type right \
    --fold-ln

Reproducing the Paper

The experiments from the paper are organized under experiments/, with one subdirectory per experiment. For additional details on how to run each experiment, please refer to the corresponding README in each directory.

Section Experiment Directory Status
Sec. 4.1 Interpretability-Fidelity Analysis experiments/fidelity/
Sec. 4.2 Grounding Singular Vectors to Images experiments/grounding/ 🚫
Sec. 5.1 Suppressing Spurious Correlations experiments/spurious/
Sec. 5.2 Removing NSFW Concepts experiments/nsfw/
Sec. 5.3 Improving Classification Performance experiments/classification/
Sec. 6 Interpreting Model Adaptation experiments/finetune/ 🚫

Citation

If you find this work useful, please cite our paper:

@inproceedings{gentile2026sith,
	title        = {From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition},
	author       = {Gentile, Francesco and Dall'Asen, Nicola and Tonini, Francesco and Mancini, Massimiliano and Vaquero, Lorenzo and Ricci, Elisa},
	year         = 2026,
	booktitle    = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

[CVPR 2026] From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages