Human Things is a research codebase for testing whether human similarity judgments improve visual embeddings beyond image-only training. The project uses THINGS object images, THINGS human odd-one-out similarity judgments, and THINGSplus metadata to compare an image-only ResNet-50 baseline against several human-informed fine-tuning strategies.
The core question is:
Does adding human similarity knowledge improve the semantic quality and practical usefulness of visual embeddings beyond visual-only training?
- About
- Research Overview
- Figure Preview
- Repository Layout
- Technical Requirements
- Setup
- Data Setup
- Pipeline Usage
- Generated Figures
- Current Results Snapshot
- File Overview
- Git and Large Files
- Documentation
- License
- Citation / Acknowledgements
This repository was made by Sabrina Zaki Hansen as a data science / cognitive science research project. The work is centered on a controlled learning process rather than a single model leaderboard:
- Train a visual-only baseline on THINGS object classification.
- Add human similarity supervision.
- Add shuffled-similarity controls.
- Test alternative human-similarity injection strategies.
- Benchmark all embeddings on practical retrieval/classification tasks, THINGSplus transfer variables, and human-source diagnostics.
The project is called Human Things because it connects human similarity structure with the THINGS object image dataset.
The experiments use an ImageNet-pretrained ResNet-50 backbone. The image-only baseline is fine-tuned to classify each THINGS image into one of 1,854 object concepts. Human-informed variants start from the trained baseline checkpoint and add concept-level supervision from human odd-one-out similarity judgments.
The tested model variants are:
| Model | Description |
|---|---|
| Image-only classifier | baseline: ResNet-50 trained on THINGS concept classification. |
| Fixed-prototype triplets | fixed_prototype_triplets: continued fine-tuning with cross-entropy plus weak human triplet regularization against fixed train-image prototypes. |
| Fixed-prototype control | fixed_prototype_control: matched shuffled-triplet control with preserved anchor frequency. |
| Batch-prototype triplets | batch_prototype_triplets: current-batch concept-prototype triplet loss, capped to 1,200 CPU-feasible batches. |
| High-pressure triplets | high_pressure_triplets: stronger human-similarity weighting with weaker classification loss. |
| Joint matrix alignment | joint_matrix_alignment: THINGS fine-tuning from ImageNet initialization with classification plus human similarity matrix loss from the first epoch. |
| Matrix control | matrix_control: matched shuffled-matrix control. |
The main result is nuanced: weak human-informed training improved practical metrics, but the shuffled control improved almost identically. Stronger human weighting improved within-source human-similarity alignment, but did not robustly improve practical utility or external THINGSplus transfer. The current evidence supports the claim that how human similarity is injected matters, and that shuffled controls are essential.
The editable draw.io sources live next to their exported PNG previews.
Data wrangling
Modeling and evaluation
Visual utility
Semantic structure
Human-specific signal
Qualitative evidence
For an IKEA-facing communication of the results, the practical question is not simply whether a model retrieves more furniture from the same category. Customers may seek products that feel related in use, atmosphere, or aesthetic association. The client figure therefore shows examples where human-matrix alignment changes the suggested relationship for home-product queries, alongside held-out human similarity scores.
The selected cases illustrate potentially useful relational recommendations, while the query-level change panel makes the limitation explicit: the observed shifts are mixed across home products and should motivate targeted testing rather than a claim of overall recommender improvement.
human-things/
├── assets/
│ └── logo.svg
├── data/
│ ├── baseline/
│ ├── human_similarity/
│ ├── processed/
│ └── raw/ # local/raw data; not intended for normal Git tracking
├── docs/
│ └── METHODS_AND_RESULTS.md
├── outputs/
│ ├── baseline_resnet50/
│ ├── human_informed_resnet50*/
│ ├── docs/
│ ├── figures/
│ ├── reports/
│ ├── tables/
│ └── human_similarity/
├── scripts/
│ ├── 00_setup_things_data.py
│ ├── 01_make_metadata_csv.py
│ ├── ...
│ ├── 13_evaluate_triplet_satisfaction.py
│ ├── 14_make_figures.py
│ └── 15_train_joint_matrix_alignment.py
├── src/
│ └── human_things/
│ ├── __init__.py
│ ├── metadata.py
│ ├── paths.py
│ ├── project.py
│ └── utils.py
├── paper_context/
├── pyproject.toml
├── requirements.txt
└── README.md
Recommended:
- Python 3.10 or newer
- Windows, macOS, or Linux
- Enough disk space for THINGS images, embeddings, and checkpoints
- GPU optional, but strongly recommended for training
Python dependencies are listed in:
requirements.txt
pyproject.toml
Main libraries:
- PyTorch / torchvision
- pandas / numpy
- scikit-learn
- scipy
- matplotlib
- Pillow
- tqdm
- osfclient
Create and activate a virtual environment.
PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .Bash:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .Check the environment:
python -c "import torch; print(torch.__version__); print('cuda:', torch.cuda.is_available())"
python -c "import human_things; print(human_things.PROJECT_NAME)"The raw data is expected under:
data/raw/THINGS-database/osfstorage
The setup script can fetch the tabular data from OSF:
python .\scripts\00_setup_things_data.pyTo also fetch/extract image archives:
python .\scripts\00_setup_things_data.py --download-imagesLarge raw archives and image files should generally remain local. The repository is configured to keep raw data out of normal Git tracking.
Run the scripts in order.
python .\scripts\00_setup_things_data.py
python .\scripts\01_make_metadata_csv.py
python .\scripts\02_make_image_splits.pyExpected outputs:
data/processed/concepts.csv
data/processed/images.csv
data/baseline/image_metadata.csv
data/baseline/image_splits.csv
outputs/reports/image_metadata_report.json
outputs/reports/image_splits_report.json
python .\scripts\03_train_resnet50_image_only.pyUseful CPU-safe or smoke-test options:
python .\scripts\03_train_resnet50_image_only.py --dry-run
python .\scripts\03_train_resnet50_image_only.py --head-epochs 1 --layer4-epochs 1 --max-train-batches 50The full CPU run can take a long time. In the executed project run, full baseline training on CPU took about 157,743 seconds.
python .\scripts\04_extract_resnet50_embeddings.py --batch-size 32 --num-workers 0
python .\scripts\05_evaluate_resnet50_embeddings.pypython .\scripts\06_prepare_human_similarity.py
python .\scripts\07_make_similarity_triplets.pyExpected outputs:
data/human_similarity/train_similarity_pairs.csv
data/human_similarity/val_similarity_pairs.csv
data/human_similarity/test_similarity_pairs.csv
data/human_similarity/train_triplets.csv
data/human_similarity/shuffled_train_triplets.csv
outputs/human_similarity/similarity_audit_report.json
outputs/human_similarity/triplet_audit_report.json
Fixed-prototype triplets:
python .\scripts\08_train_fixed_prototype_triplets.pyFixed-prototype shuffled control:
python .\scripts\08_train_fixed_prototype_triplets.py `
--triplets data\human_similarity\shuffled_train_triplets.csv `
--output-dir outputs\human_informed_resnet50_shuffledBatch-prototype triplets, CPU-capped:
python .\scripts\11_train_batch_prototype_triplets.py `
--epochs 1 `
--max-train-batches 1200 `
--triplets-per-batch 8 `
--images-per-concept 2 `
--output-dir outputs\human_informed_resnet50_v2_1200High-pressure triplets:
python .\scripts\12_train_high_pressure_triplets.pyJoint matrix alignment from ImageNet initialization:
python .\scripts\15_train_joint_matrix_alignment.pyMatrix shuffled control:
python .\scripts\15_train_joint_matrix_alignment.py `
--shuffle-human-matrix `
--output-dir outputs\joint_matrix_resnet50_shuffledExample:
python .\scripts\04_extract_resnet50_embeddings.py `
--checkpoint outputs\human_informed_resnet50\best_model.pt `
--output-dir outputs\human_informed_resnet50\embeddings `
--batch-size 32 `
--num-workers 0Repeat for each model output directory.
python .\scripts\09_benchmark_embeddings.py all `
--model baseline=outputs\baseline_resnet50 `
--model fixed_prototype_triplets=outputs\human_informed_resnet50 `
--model fixed_prototype_control=outputs\human_informed_resnet50_shuffled `
--model batch_prototype_triplets=outputs\human_informed_resnet50_v2_1200 `
--model high_pressure_triplets=outputs\human_informed_resnet50_v3 `
--model joint_matrix_alignment=outputs\joint_matrix_resnet50 `
--model matrix_control=outputs\joint_matrix_resnet50_shuffled `
--output-json outputs\reports\embedding_benchmark_report_with_joint_matrix.json `
--output-csv outputs\tables\embedding_benchmark_summary_with_joint_matrix.csvUse core, expanded, or all as the first argument. The expanded mode writes THINGSplus target-level and relational semantic-transfer outputs.
Compact comparison:
python .\scripts\10_compare_model_reports.pyTriplet satisfaction diagnostic:
python .\scripts\13_evaluate_triplet_satisfaction.pypython .\scripts\14_make_figures.py allYou can also generate a subset:
python .\scripts\14_make_figures.py paper
python .\scripts\14_make_figures.py improved combined
python .\scripts\14_make_figures.py client
python .\scripts\14_make_figures.py --listFigures are written under outputs/figures/ by figure family.
scripts/14_make_figures.py is the single command-line entry point. The plotting implementations live in src/human_things/figure_generators/, including the IKEA client plot in client.py.
The editable Draw.io workflow figures are:
outputs/figures/drawio/figure_datawrangling.drawio
outputs/figures/drawio/figure_modeling_evaluation_flow.drawio
From outputs/tables/embedding_benchmark_summary_with_joint_matrix.csv:
| Model | Test top-1 | Retrieval@1 | Human-pair rho | Object-properties rho |
|---|---|---|---|---|
| Image-only classifier | 0.7274 | 0.7266 | 0.4173 | 0.5793 |
| Fixed-prototype triplets | 0.7430 | 0.7422 | 0.3897 | 0.5752 |
| Fixed-prototype control | 0.7430 | 0.7423 | 0.3880 | 0.5752 |
| Batch-prototype triplets | 0.7328 | 0.7334 | 0.4001 | 0.5747 |
| High-pressure triplets | 0.7330 | 0.7265 | 0.4478 | 0.5787 |
Interpretation:
- Fixed-prototype triplets improved practical utility, but the matched shuffled control was nearly identical.
- High-pressure triplets improved within-source human alignment, but not practical retrieval or THINGSplus transfer.
- The image-only baseline already satisfied much of the real human triplet structure.
- The strongest conclusion is about the importance of controls and injection strategy, not a broad claim that human similarity universally improves visual embeddings.
See the full write-up:
docs/METHODS_AND_RESULTS.md
The single figure script can generate six figure families:
python scripts/14_make_figures.py --list| Family | Output folder | Purpose |
|---|---|---|
paper |
outputs/figures/paper/ |
Main result-section figures. |
combined |
outputs/figures/combined/ |
Composite figures for the recommended results narrative. |
improved |
outputs/figures/improved/ |
Polished alternative result plots and backing CSV files. |
examples |
outputs/figures/examples/ |
Qualitative retrieval, triplet, and semantic-probe examples. |
exploratory |
outputs/figures/exploratory/ |
Broad diagnostic plots for filtering and appendix use. |
client |
outputs/figures/client/ |
Client-adjusted visual communication, including the IKEA relationship-discovery readout. |
The main paper figure set includes:
figure_classification_top1figure_retrieval_curvesfigure_human_similarity_alignmentfigure_strategy_delta_heatmapfigure_benchmark_rank_bumpfigure_model_profile_radarfigure_model_similarity_mapfigure_alignment_vs_utility_tradeofffigure_thingsplus_transfer_focusfigure_fixed_prototype_triplets_vs_controlfigure_joint_matrix_vs_shuffled_deltasfigure_triplet_satisfactionfigure_triplet_margin_intervalsfigure_expanded_thingsplus_all_benchmarks
The figure inventory and interpretation notes are saved in:
outputs/figures/paper/paper_figure_notes.json
Small package namespace for shared metadata, paths, labels, and helpers.
| File | Purpose |
|---|---|
metadata.py |
Model labels, colors, and figure styling constants. |
project.py |
Project name, version, research question, constants. |
paths.py |
Canonical repository paths. |
utils.py |
Small shared utilities used by scripts. |
__init__.py |
Package exports. |
Numbered runnable pipeline entrypoints. These are the main way to reproduce the project.
Long-form methods/results documentation.
Project branding and README assets.
Processed and local data. Raw THINGS files are expected locally and are not normally committed.
Model checkpoints, embeddings, reports, and figures. Large .pt and .npy files should be handled with Git LFS if tracked.
- Seed used throughout the main scripts:
7. - Human similarity is concept-level supervision.
- THINGSplus variables are reserved for evaluation and are not used to train the human-informed losses.
- Human-pair Spearman is a within-source alignment diagnostic, not a fully independent semantic benchmark.
- The batch-prototype triplet run in the current results is CPU-capped at 1,200 batches.
- CPU training is possible but slow. GPU training is recommended for new full runs.
This project can produce large outputs:
.ptmodel checkpoints.npyembedding arrays- extracted image archives
.gitattributes is configured for Git LFS tracking of .pt and .npy files. Raw data archives should usually stay local.
Main detailed write-up:
docs/METHODS_AND_RESULTS.md
Source papers and context PDFs are kept under:
paper_context/
This project is licensed under the GNU General Public License, Version 3 (GPL-3.0).
Third-party datasets, source papers, and externally supplied assets remain subject to their original terms and are not relicensed by this repository.
This project builds on the THINGS image database, THINGS human similarity work, and THINGSplus annotations.






