From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition
Francesco Gentile1, Nicola Dall'Asen1,2, Francesco Tonini1,3, Massimiliano Mancini1, Lorenzo Vaquero3, Elisa Ricci1,3
1University of Trento 2University of Pisa 3Fondazione Bruno Kessler
SITH is a data-free, training-free, weight-based interpretability framework for CLIP's vision transformer. For each attention head, it decomposes the Value-Output (VO) weight matrix via Singular Value Decomposition (SVD), revealing the head's dominant computational directions. Each singular vector is then interpreted using COMP (Coherent Orthogonal Matching Pursuit), a novel sparse decomposition algorithm that explains it as a non-negative combination of human-interpretable textual concepts, optimizing for both reconstruction fidelity and semantic coherence.
- Data-free — analysis is performed directly on model weights, with no dataset or activations needed.
- Training-free — no optimization or gradient computation required.
- Fine-grained — provides intra-head explanations at the individual singular vector level.
- Actionable — enables precise model edits (suppressing spurious correlations, removing unsafe concepts, improving classification) without retraining.
SITH can be installed as a standalone library if you are interested in the core components (CLIP model utilities, sparse dictionary learning algorithms, etc.) without reproducing the paper's experiments:
uv pip install git+https://github.com/frangente/SITH.gitTo reproduce the paper's experiments, clone the repository and install the dependencies:
git clone https://github.com/frangente/SITH.git
cd SITH
uv sync --frozen # core dependencies
uv sync --group scripts --frozen # + dependencies for data preparation scripts
uv sync --group experiments --frozen # + dependencies for experimentsAll scripts and experiments expect intermediate data under data/ in the repository root. Pre-computed resources for the experiments in the paper are available on Google Drive and can be downloaded with:
uv run python scripts/download_model_data.py \
--model-name ViT-L-14 \
--pretrained laion2b_s32b_b82kThe expected layout is:
data/
├── dictionaries/ # Textual concept pools
│ ├── conceptnet.txt
│ ├── wordnet.txt
│ └── ...
└── models/
└── ViT-L-14/ # Model architecture
└── laion2b_s32b_b82k/ # Pretrained weights
├── image_mean.pt # Mean image embedding (over CC12M)
├── dictionaries/ # Encoded dictionaries
│ ├── conceptnet.pt
│ └── ...
└── decompositions/ # SITH decompositions
├── layer-20_right_foldln_conceptnet_comp-0.3_sparsity-5.pt
├── layer-20_left_nofoldln_conceptnet_omp_sparsity-10.pt
└── ...
Decomposition filenames encode the key parameters for quick identification:
layer-{layer}_{left|right}_{foldln|nofoldln}_{dictionary}_{method}_sparsity-{K}.pt
Each .pt file contains a dict with scores and indices tensors, representing the sparse decomposition of the specified singular vectors for all heads in the given layer. Each tensor has shape (num_heads, rank, K), where rank is the number of singular vectors decomposed per head and K is the sparsity level (number of concepts selected per vector).
If you want to recompute the resources from scratch (e.g., for a different model or dictionary), the scripts under scripts/ cover the full pipeline.
Download one of the available textual concept pools (e.g., ConceptNet 5.5):
uv run python scripts/download_dictionary.py --dictionary conceptnetAvailable dictionaries: conceptnet, efros, laion, wordnet.
Encode the textual dictionary into CLIP embeddings for a specific model:
uv run python scripts/encode_dictionary.py \
--model-name ViT-L-14 \
--pretrained laion2b_s32b_b82k \
--dictionary conceptnetCompute the mean image embedding over CC12M to mitigate the multimodality gap of CLIP:
uv run python scripts/compute_image_mean.py \
--model-name ViT-L-14 \
--pretrained laion2b_s32b_b82kRun SITH to decompose VO matrices into interpretable singular vectors:
uv run python scripts/decompose.py \
--model-name ViT-L-14 \
--pretrained laion2b_s32b_b82k \
--layers 20 21 22 23 \
--rank 64 \
--method comp-0.3 \
--sparsity 5 \
--dictionary conceptnet \
--sv-type right \
--fold-lnThe experiments from the paper are organized under experiments/, with one subdirectory per experiment. For additional details on how to run each experiment, please refer to the corresponding README in each directory.
| Section | Experiment | Directory | Status |
|---|---|---|---|
| Sec. 4.1 | Interpretability-Fidelity Analysis | experiments/fidelity/ |
✅ |
| Sec. 4.2 | Grounding Singular Vectors to Images | experiments/grounding/ |
🚫 |
| Sec. 5.1 | Suppressing Spurious Correlations | experiments/spurious/ |
✅ |
| Sec. 5.2 | Removing NSFW Concepts | experiments/nsfw/ |
✅ |
| Sec. 5.3 | Improving Classification Performance | experiments/classification/ |
✅ |
| Sec. 6 | Interpreting Model Adaptation | experiments/finetune/ |
🚫 |
If you find this work useful, please cite our paper:
@inproceedings{gentile2026sith,
title = {From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition},
author = {Gentile, Francesco and Dall'Asen, Nicola and Tonini, Francesco and Mancini, Massimiliano and Vaquero, Lorenzo and Ricci, Elisa},
year = 2026,
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
}This project is licensed under the MIT License. See the LICENSE file for details.
