Human Things

Human Things is a research codebase for testing whether human similarity judgments improve visual embeddings beyond image-only training. The project uses THINGS object images, THINGS human odd-one-out similarity judgments, and THINGSplus metadata to compare an image-only ResNet-50 baseline against several human-informed fine-tuning strategies.

The core question is:

Does adding human similarity knowledge improve the semantic quality and practical usefulness of visual embeddings beyond visual-only training?

About

This repository was made by Sabrina Zaki Hansen as a data science / cognitive science research project. The work is centered on a controlled learning process rather than a single model leaderboard:

Train a visual-only baseline on THINGS object classification.
Add human similarity supervision.
Add shuffled-similarity controls.
Test alternative human-similarity injection strategies.
Benchmark all embeddings on practical retrieval/classification tasks, THINGSplus transfer variables, and human-source diagnostics.

The project is called Human Things because it connects human similarity structure with the THINGS object image dataset.

Research Overview

The experiments use an ImageNet-pretrained ResNet-50 backbone. The image-only baseline is fine-tuned to classify each THINGS image into one of 1,854 object concepts. Human-informed variants start from the trained baseline checkpoint and add concept-level supervision from human odd-one-out similarity judgments.

The tested model variants are:

Model	Description
Image-only classifier	`baseline`: ResNet-50 trained on THINGS concept classification.
Fixed-prototype triplets	`fixed_prototype_triplets`: continued fine-tuning with cross-entropy plus weak human triplet regularization against fixed train-image prototypes.
Fixed-prototype control	`fixed_prototype_control`: matched shuffled-triplet control with preserved anchor frequency.
Batch-prototype triplets	`batch_prototype_triplets`: current-batch concept-prototype triplet loss, capped to 1,200 CPU-feasible batches.
High-pressure triplets	`high_pressure_triplets`: stronger human-similarity weighting with weaker classification loss.
Joint matrix alignment	`joint_matrix_alignment`: THINGS fine-tuning from ImageNet initialization with classification plus human similarity matrix loss from the first epoch.
Matrix control	`matrix_control`: matched shuffled-matrix control.

The main result is nuanced: weak human-informed training improved practical metrics, but the shuffled control improved almost identically. Stronger human weighting improved within-source human-similarity alignment, but did not robustly improve practical utility or external THINGSplus transfer. The current evidence supports the claim that how human similarity is injected matters, and that shuffled controls are essential.

Figure Preview

Workflow Diagrams

The editable draw.io sources live next to their exported PNG previews.

Data wrangling

Modeling and evaluation

Key Result Figures

Visual utility

Semantic structure

Human-specific signal

Qualitative evidence

Client Communication Figure

For an IKEA-facing communication of the results, the practical question is not simply whether a model retrieves more furniture from the same category. Customers may seek products that feel related in use, atmosphere, or aesthetic association. The client figure therefore shows examples where human-matrix alignment changes the suggested relationship for home-product queries, alongside held-out human similarity scores.

The selected cases illustrate potentially useful relational recommendations, while the query-level change panel makes the limitation explicit: the observed shifts are mixed across home products and should motivate targeted testing rather than a claim of overall recommender improvement.

Repository Layout

human-things/
├── assets/
│   └── logo.svg
├── data/
│   ├── baseline/
│   ├── human_similarity/
│   ├── processed/
│   └── raw/                         # local/raw data; not intended for normal Git tracking
├── docs/
│   └── METHODS_AND_RESULTS.md
├── outputs/
│   ├── baseline_resnet50/
│   ├── human_informed_resnet50*/
│   ├── docs/
│   ├── figures/
│   ├── reports/
│   ├── tables/
│   └── human_similarity/
├── scripts/
│   ├── 00_setup_things_data.py
│   ├── 01_make_metadata_csv.py
│   ├── ...
│   ├── 13_evaluate_triplet_satisfaction.py
│   ├── 14_make_figures.py
│   └── 15_train_joint_matrix_alignment.py
├── src/
│   └── human_things/
│       ├── __init__.py
│       ├── metadata.py
│       ├── paths.py
│       ├── project.py
│       └── utils.py
├── paper_context/
├── pyproject.toml
├── requirements.txt
└── README.md

Technical Requirements

Recommended:

Python 3.10 or newer
Windows, macOS, or Linux
Enough disk space for THINGS images, embeddings, and checkpoints
GPU optional, but strongly recommended for training

Python dependencies are listed in:

requirements.txt
pyproject.toml

Main libraries:

PyTorch / torchvision
pandas / numpy
scikit-learn
scipy
matplotlib
Pillow
tqdm
osfclient

Setup

Create and activate a virtual environment.

PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Bash:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Check the environment:

python -c "import torch; print(torch.__version__); print('cuda:', torch.cuda.is_available())"
python -c "import human_things; print(human_things.PROJECT_NAME)"

Data Setup

The raw data is expected under:

data/raw/THINGS-database/osfstorage

The setup script can fetch the tabular data from OSF:

python .\scripts\00_setup_things_data.py

To also fetch/extract image archives:

python .\scripts\00_setup_things_data.py --download-images

Large raw archives and image files should generally remain local. The repository is configured to keep raw data out of normal Git tracking.

Pipeline Usage

Run the scripts in order.

1. Build Processed Tables

python .\scripts\00_setup_things_data.py
python .\scripts\01_make_metadata_csv.py
python .\scripts\02_make_image_splits.py

Expected outputs:

data/processed/concepts.csv
data/processed/images.csv
data/baseline/image_metadata.csv
data/baseline/image_splits.csv
outputs/reports/image_metadata_report.json
outputs/reports/image_splits_report.json

2. Train the Image-Only Baseline

python .\scripts\03_train_resnet50_image_only.py

Useful CPU-safe or smoke-test options:

python .\scripts\03_train_resnet50_image_only.py --dry-run
python .\scripts\03_train_resnet50_image_only.py --head-epochs 1 --layer4-epochs 1 --max-train-batches 50

The full CPU run can take a long time. In the executed project run, full baseline training on CPU took about 157,743 seconds.

3. Extract and Evaluate Baseline Embeddings

python .\scripts\04_extract_resnet50_embeddings.py --batch-size 32 --num-workers 0
python .\scripts\05_evaluate_resnet50_embeddings.py

4. Prepare Human Similarity Data

python .\scripts\06_prepare_human_similarity.py
python .\scripts\07_make_similarity_triplets.py

Expected outputs:

data/human_similarity/train_similarity_pairs.csv
data/human_similarity/val_similarity_pairs.csv
data/human_similarity/test_similarity_pairs.csv
data/human_similarity/train_triplets.csv
data/human_similarity/shuffled_train_triplets.csv
outputs/human_similarity/similarity_audit_report.json
outputs/human_similarity/triplet_audit_report.json

5. Train Human-Informed Models

Fixed-prototype triplets:

python .\scripts\08_train_fixed_prototype_triplets.py

Fixed-prototype shuffled control:

python .\scripts\08_train_fixed_prototype_triplets.py `
  --triplets data\human_similarity\shuffled_train_triplets.csv `
  --output-dir outputs\human_informed_resnet50_shuffled

Batch-prototype triplets, CPU-capped:

python .\scripts\11_train_batch_prototype_triplets.py `
  --epochs 1 `
  --max-train-batches 1200 `
  --triplets-per-batch 8 `
  --images-per-concept 2 `
  --output-dir outputs\human_informed_resnet50_v2_1200

High-pressure triplets:

python .\scripts\12_train_high_pressure_triplets.py

Joint matrix alignment from ImageNet initialization:

python .\scripts\15_train_joint_matrix_alignment.py

Matrix shuffled control:

python .\scripts\15_train_joint_matrix_alignment.py `
  --shuffle-human-matrix `
  --output-dir outputs\joint_matrix_resnet50_shuffled

6. Extract Embeddings for Each Model

Example:

python .\scripts\04_extract_resnet50_embeddings.py `
  --checkpoint outputs\human_informed_resnet50\best_model.pt `
  --output-dir outputs\human_informed_resnet50\embeddings `
  --batch-size 32 `
  --num-workers 0

Repeat for each model output directory.

7. Benchmark and Compare

python .\scripts\09_benchmark_embeddings.py all `
  --model baseline=outputs\baseline_resnet50 `
  --model fixed_prototype_triplets=outputs\human_informed_resnet50 `
  --model fixed_prototype_control=outputs\human_informed_resnet50_shuffled `
  --model batch_prototype_triplets=outputs\human_informed_resnet50_v2_1200 `
  --model high_pressure_triplets=outputs\human_informed_resnet50_v3 `
  --model joint_matrix_alignment=outputs\joint_matrix_resnet50 `
  --model matrix_control=outputs\joint_matrix_resnet50_shuffled `
  --output-json outputs\reports\embedding_benchmark_report_with_joint_matrix.json `
  --output-csv outputs\tables\embedding_benchmark_summary_with_joint_matrix.csv

Use core, expanded, or all as the first argument. The expanded mode writes THINGSplus target-level and relational semantic-transfer outputs.

Compact comparison:

python .\scripts\10_compare_model_reports.py

Triplet satisfaction diagnostic:

python .\scripts\13_evaluate_triplet_satisfaction.py

8. Generate Figures

python .\scripts\14_make_figures.py all

You can also generate a subset:

python .\scripts\14_make_figures.py paper
python .\scripts\14_make_figures.py improved combined
python .\scripts\14_make_figures.py client
python .\scripts\14_make_figures.py --list

Figures are written under outputs/figures/ by figure family.

scripts/14_make_figures.py is the single command-line entry point. The plotting implementations live in src/human_things/figure_generators/, including the IKEA client plot in client.py.

The editable Draw.io workflow figures are:

outputs/figures/drawio/figure_datawrangling.drawio
outputs/figures/drawio/figure_modeling_evaluation_flow.drawio

Current Results Snapshot

From outputs/tables/embedding_benchmark_summary_with_joint_matrix.csv:

Model	Test top-1	Retrieval@1	Human-pair rho	Object-properties rho
Image-only classifier	0.7274	0.7266	0.4173	0.5793
Fixed-prototype triplets	0.7430	0.7422	0.3897	0.5752
Fixed-prototype control	0.7430	0.7423	0.3880	0.5752
Batch-prototype triplets	0.7328	0.7334	0.4001	0.5747
High-pressure triplets	0.7330	0.7265	0.4478	0.5787

Interpretation:

Fixed-prototype triplets improved practical utility, but the matched shuffled control was nearly identical.
High-pressure triplets improved within-source human alignment, but not practical retrieval or THINGSplus transfer.
The image-only baseline already satisfied much of the real human triplet structure.
The strongest conclusion is about the importance of controls and injection strategy, not a broad claim that human similarity universally improves visual embeddings.

See the full write-up:

docs/METHODS_AND_RESULTS.md

Generated Figures

The single figure script can generate six figure families:

python scripts/14_make_figures.py --list

Family	Output folder	Purpose
`paper`	`outputs/figures/paper/`	Main result-section figures.
`combined`	`outputs/figures/combined/`	Composite figures for the recommended results narrative.
`improved`	`outputs/figures/improved/`	Polished alternative result plots and backing CSV files.
`examples`	`outputs/figures/examples/`	Qualitative retrieval, triplet, and semantic-probe examples.
`exploratory`	`outputs/figures/exploratory/`	Broad diagnostic plots for filtering and appendix use.
`client`	`outputs/figures/client/`	Client-adjusted visual communication, including the IKEA relationship-discovery readout.

The main paper figure set includes:

figure_classification_top1
figure_retrieval_curves
figure_human_similarity_alignment
figure_strategy_delta_heatmap
figure_benchmark_rank_bump
figure_model_profile_radar
figure_model_similarity_map
figure_alignment_vs_utility_tradeoff
figure_thingsplus_transfer_focus
figure_fixed_prototype_triplets_vs_control
figure_joint_matrix_vs_shuffled_deltas
figure_triplet_satisfaction
figure_triplet_margin_intervals
figure_expanded_thingsplus_all_benchmarks

The figure inventory and interpretation notes are saved in:

outputs/figures/paper/paper_figure_notes.json

File Overview

`src/human_things/`

Small package namespace for shared metadata, paths, labels, and helpers.

File	Purpose
`metadata.py`	Model labels, colors, and figure styling constants.
`project.py`	Project name, version, research question, constants.
`paths.py`	Canonical repository paths.
`utils.py`	Small shared utilities used by scripts.
`__init__.py`	Package exports.

`scripts/`

Numbered runnable pipeline entrypoints. These are the main way to reproduce the project.

`docs/`

Long-form methods/results documentation.

`assets/`

Project branding and README assets.

`data/`

Processed and local data. Raw THINGS files are expected locally and are not normally committed.

`outputs/`

Model checkpoints, embeddings, reports, and figures. Large .pt and .npy files should be handled with Git LFS if tracked.

Reproducibility Notes

Seed used throughout the main scripts: 7.
Human similarity is concept-level supervision.
THINGSplus variables are reserved for evaluation and are not used to train the human-informed losses.
Human-pair Spearman is a within-source alignment diagnostic, not a fully independent semantic benchmark.
The batch-prototype triplet run in the current results is CPU-capped at 1,200 batches.
CPU training is possible but slow. GPU training is recommended for new full runs.

Git and Large Files

This project can produce large outputs:

.pt model checkpoints
.npy embedding arrays
extracted image archives

.gitattributes is configured for Git LFS tracking of .pt and .npy files. Raw data archives should usually stay local.

Documentation

Main detailed write-up:

docs/METHODS_AND_RESULTS.md

Source papers and context PDFs are kept under:

paper_context/

License

This project is licensed under the GNU General Public License, Version 3 (GPL-3.0).

Third-party datasets, source papers, and externally supplied assets remain subject to their original terms and are not relicensed by this repository.

Citation / Acknowledgements

This project builds on the THINGS image database, THINGS human similarity work, and THINGSplus annotations.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
data		data
outputs		outputs
paper_context		paper_context
scripts		scripts
src/human_things		src/human_things
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENCE.md		LICENCE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Human Things

Table of Contents

About

Research Overview

Figure Preview

Workflow Diagrams

Key Result Figures

Client Communication Figure

Repository Layout

Technical Requirements

Setup

Data Setup

Pipeline Usage

1. Build Processed Tables

2. Train the Image-Only Baseline

3. Extract and Evaluate Baseline Embeddings

4. Prepare Human Similarity Data

5. Train Human-Informed Models

6. Extract Embeddings for Each Model

7. Benchmark and Compare

8. Generate Figures

Current Results Snapshot

Generated Figures

File Overview

src/human_things/

scripts/

docs/

assets/

data/

outputs/

Reproducibility Notes

Git and Large Files

Documentation

License

Citation / Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/human_things/`

`scripts/`

`docs/`

`assets/`

`data/`

`outputs/`

Packages