Skip to content

sabszh/human-things

Repository files navigation

Human Things logo

Human Things

Human Things is a research codebase for testing whether human similarity judgments improve visual embeddings beyond image-only training. The project uses THINGS object images, THINGS human odd-one-out similarity judgments, and THINGSplus metadata to compare an image-only ResNet-50 baseline against several human-informed fine-tuning strategies.

The core question is:

Does adding human similarity knowledge improve the semantic quality and practical usefulness of visual embeddings beyond visual-only training?

Table of Contents

About

This repository was made by Sabrina Zaki Hansen as a data science / cognitive science research project. The work is centered on a controlled learning process rather than a single model leaderboard:

  1. Train a visual-only baseline on THINGS object classification.
  2. Add human similarity supervision.
  3. Add shuffled-similarity controls.
  4. Test alternative human-similarity injection strategies.
  5. Benchmark all embeddings on practical retrieval/classification tasks, THINGSplus transfer variables, and human-source diagnostics.

The project is called Human Things because it connects human similarity structure with the THINGS object image dataset.

Research Overview

The experiments use an ImageNet-pretrained ResNet-50 backbone. The image-only baseline is fine-tuned to classify each THINGS image into one of 1,854 object concepts. Human-informed variants start from the trained baseline checkpoint and add concept-level supervision from human odd-one-out similarity judgments.

The tested model variants are:

Model Description
Image-only classifier baseline: ResNet-50 trained on THINGS concept classification.
Fixed-prototype triplets fixed_prototype_triplets: continued fine-tuning with cross-entropy plus weak human triplet regularization against fixed train-image prototypes.
Fixed-prototype control fixed_prototype_control: matched shuffled-triplet control with preserved anchor frequency.
Batch-prototype triplets batch_prototype_triplets: current-batch concept-prototype triplet loss, capped to 1,200 CPU-feasible batches.
High-pressure triplets high_pressure_triplets: stronger human-similarity weighting with weaker classification loss.
Joint matrix alignment joint_matrix_alignment: THINGS fine-tuning from ImageNet initialization with classification plus human similarity matrix loss from the first epoch.
Matrix control matrix_control: matched shuffled-matrix control.

The main result is nuanced: weak human-informed training improved practical metrics, but the shuffled control improved almost identically. Stronger human weighting improved within-source human-similarity alignment, but did not robustly improve practical utility or external THINGSplus transfer. The current evidence supports the claim that how human similarity is injected matters, and that shuffled controls are essential.

Figure Preview

Workflow Diagrams

The editable draw.io sources live next to their exported PNG previews.

Data wrangling

Data wrangling workflow

Modeling and evaluation

Modeling and evaluation flow

Key Result Figures

Visual utility

Visual utility composite

Semantic structure

Semantic structure composite

Human-specific signal

Specificity composite

Qualitative evidence

Qualitative evidence composite

Client Communication Figure

For an IKEA-facing communication of the results, the practical question is not simply whether a model retrieves more furniture from the same category. Customers may seek products that feel related in use, atmosphere, or aesthetic association. The client figure therefore shows examples where human-matrix alignment changes the suggested relationship for home-product queries, alongside held-out human similarity scores.

IKEA client readout: relationship discovery examples

The selected cases illustrate potentially useful relational recommendations, while the query-level change panel makes the limitation explicit: the observed shifts are mixed across home products and should motivate targeted testing rather than a claim of overall recommender improvement.

Repository Layout

human-things/
├── assets/
│   └── logo.svg
├── data/
│   ├── baseline/
│   ├── human_similarity/
│   ├── processed/
│   └── raw/                         # local/raw data; not intended for normal Git tracking
├── docs/
│   └── METHODS_AND_RESULTS.md
├── outputs/
│   ├── baseline_resnet50/
│   ├── human_informed_resnet50*/
│   ├── docs/
│   ├── figures/
│   ├── reports/
│   ├── tables/
│   └── human_similarity/
├── scripts/
│   ├── 00_setup_things_data.py
│   ├── 01_make_metadata_csv.py
│   ├── ...
│   ├── 13_evaluate_triplet_satisfaction.py
│   ├── 14_make_figures.py
│   └── 15_train_joint_matrix_alignment.py
├── src/
│   └── human_things/
│       ├── __init__.py
│       ├── metadata.py
│       ├── paths.py
│       ├── project.py
│       └── utils.py
├── paper_context/
├── pyproject.toml
├── requirements.txt
└── README.md

Technical Requirements

Recommended:

  • Python 3.10 or newer
  • Windows, macOS, or Linux
  • Enough disk space for THINGS images, embeddings, and checkpoints
  • GPU optional, but strongly recommended for training

Python dependencies are listed in:

requirements.txt
pyproject.toml

Main libraries:

  • PyTorch / torchvision
  • pandas / numpy
  • scikit-learn
  • scipy
  • matplotlib
  • Pillow
  • tqdm
  • osfclient

Setup

Create and activate a virtual environment.

PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Bash:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Check the environment:

python -c "import torch; print(torch.__version__); print('cuda:', torch.cuda.is_available())"
python -c "import human_things; print(human_things.PROJECT_NAME)"

Data Setup

The raw data is expected under:

data/raw/THINGS-database/osfstorage

The setup script can fetch the tabular data from OSF:

python .\scripts\00_setup_things_data.py

To also fetch/extract image archives:

python .\scripts\00_setup_things_data.py --download-images

Large raw archives and image files should generally remain local. The repository is configured to keep raw data out of normal Git tracking.

Pipeline Usage

Run the scripts in order.

1. Build Processed Tables

python .\scripts\00_setup_things_data.py
python .\scripts\01_make_metadata_csv.py
python .\scripts\02_make_image_splits.py

Expected outputs:

data/processed/concepts.csv
data/processed/images.csv
data/baseline/image_metadata.csv
data/baseline/image_splits.csv
outputs/reports/image_metadata_report.json
outputs/reports/image_splits_report.json

2. Train the Image-Only Baseline

python .\scripts\03_train_resnet50_image_only.py

Useful CPU-safe or smoke-test options:

python .\scripts\03_train_resnet50_image_only.py --dry-run
python .\scripts\03_train_resnet50_image_only.py --head-epochs 1 --layer4-epochs 1 --max-train-batches 50

The full CPU run can take a long time. In the executed project run, full baseline training on CPU took about 157,743 seconds.

3. Extract and Evaluate Baseline Embeddings

python .\scripts\04_extract_resnet50_embeddings.py --batch-size 32 --num-workers 0
python .\scripts\05_evaluate_resnet50_embeddings.py

4. Prepare Human Similarity Data

python .\scripts\06_prepare_human_similarity.py
python .\scripts\07_make_similarity_triplets.py

Expected outputs:

data/human_similarity/train_similarity_pairs.csv
data/human_similarity/val_similarity_pairs.csv
data/human_similarity/test_similarity_pairs.csv
data/human_similarity/train_triplets.csv
data/human_similarity/shuffled_train_triplets.csv
outputs/human_similarity/similarity_audit_report.json
outputs/human_similarity/triplet_audit_report.json

5. Train Human-Informed Models

Fixed-prototype triplets:

python .\scripts\08_train_fixed_prototype_triplets.py

Fixed-prototype shuffled control:

python .\scripts\08_train_fixed_prototype_triplets.py `
  --triplets data\human_similarity\shuffled_train_triplets.csv `
  --output-dir outputs\human_informed_resnet50_shuffled

Batch-prototype triplets, CPU-capped:

python .\scripts\11_train_batch_prototype_triplets.py `
  --epochs 1 `
  --max-train-batches 1200 `
  --triplets-per-batch 8 `
  --images-per-concept 2 `
  --output-dir outputs\human_informed_resnet50_v2_1200

High-pressure triplets:

python .\scripts\12_train_high_pressure_triplets.py

Joint matrix alignment from ImageNet initialization:

python .\scripts\15_train_joint_matrix_alignment.py

Matrix shuffled control:

python .\scripts\15_train_joint_matrix_alignment.py `
  --shuffle-human-matrix `
  --output-dir outputs\joint_matrix_resnet50_shuffled

6. Extract Embeddings for Each Model

Example:

python .\scripts\04_extract_resnet50_embeddings.py `
  --checkpoint outputs\human_informed_resnet50\best_model.pt `
  --output-dir outputs\human_informed_resnet50\embeddings `
  --batch-size 32 `
  --num-workers 0

Repeat for each model output directory.

7. Benchmark and Compare

python .\scripts\09_benchmark_embeddings.py all `
  --model baseline=outputs\baseline_resnet50 `
  --model fixed_prototype_triplets=outputs\human_informed_resnet50 `
  --model fixed_prototype_control=outputs\human_informed_resnet50_shuffled `
  --model batch_prototype_triplets=outputs\human_informed_resnet50_v2_1200 `
  --model high_pressure_triplets=outputs\human_informed_resnet50_v3 `
  --model joint_matrix_alignment=outputs\joint_matrix_resnet50 `
  --model matrix_control=outputs\joint_matrix_resnet50_shuffled `
  --output-json outputs\reports\embedding_benchmark_report_with_joint_matrix.json `
  --output-csv outputs\tables\embedding_benchmark_summary_with_joint_matrix.csv

Use core, expanded, or all as the first argument. The expanded mode writes THINGSplus target-level and relational semantic-transfer outputs.

Compact comparison:

python .\scripts\10_compare_model_reports.py

Triplet satisfaction diagnostic:

python .\scripts\13_evaluate_triplet_satisfaction.py

8. Generate Figures

python .\scripts\14_make_figures.py all

You can also generate a subset:

python .\scripts\14_make_figures.py paper
python .\scripts\14_make_figures.py improved combined
python .\scripts\14_make_figures.py client
python .\scripts\14_make_figures.py --list

Figures are written under outputs/figures/ by figure family.

scripts/14_make_figures.py is the single command-line entry point. The plotting implementations live in src/human_things/figure_generators/, including the IKEA client plot in client.py.

The editable Draw.io workflow figures are:

outputs/figures/drawio/figure_datawrangling.drawio
outputs/figures/drawio/figure_modeling_evaluation_flow.drawio

Current Results Snapshot

From outputs/tables/embedding_benchmark_summary_with_joint_matrix.csv:

Model Test top-1 Retrieval@1 Human-pair rho Object-properties rho
Image-only classifier 0.7274 0.7266 0.4173 0.5793
Fixed-prototype triplets 0.7430 0.7422 0.3897 0.5752
Fixed-prototype control 0.7430 0.7423 0.3880 0.5752
Batch-prototype triplets 0.7328 0.7334 0.4001 0.5747
High-pressure triplets 0.7330 0.7265 0.4478 0.5787

Interpretation:

  • Fixed-prototype triplets improved practical utility, but the matched shuffled control was nearly identical.
  • High-pressure triplets improved within-source human alignment, but not practical retrieval or THINGSplus transfer.
  • The image-only baseline already satisfied much of the real human triplet structure.
  • The strongest conclusion is about the importance of controls and injection strategy, not a broad claim that human similarity universally improves visual embeddings.

See the full write-up:

docs/METHODS_AND_RESULTS.md

Generated Figures

The single figure script can generate six figure families:

python scripts/14_make_figures.py --list
Family Output folder Purpose
paper outputs/figures/paper/ Main result-section figures.
combined outputs/figures/combined/ Composite figures for the recommended results narrative.
improved outputs/figures/improved/ Polished alternative result plots and backing CSV files.
examples outputs/figures/examples/ Qualitative retrieval, triplet, and semantic-probe examples.
exploratory outputs/figures/exploratory/ Broad diagnostic plots for filtering and appendix use.
client outputs/figures/client/ Client-adjusted visual communication, including the IKEA relationship-discovery readout.

The main paper figure set includes:

  • figure_classification_top1
  • figure_retrieval_curves
  • figure_human_similarity_alignment
  • figure_strategy_delta_heatmap
  • figure_benchmark_rank_bump
  • figure_model_profile_radar
  • figure_model_similarity_map
  • figure_alignment_vs_utility_tradeoff
  • figure_thingsplus_transfer_focus
  • figure_fixed_prototype_triplets_vs_control
  • figure_joint_matrix_vs_shuffled_deltas
  • figure_triplet_satisfaction
  • figure_triplet_margin_intervals
  • figure_expanded_thingsplus_all_benchmarks

The figure inventory and interpretation notes are saved in:

outputs/figures/paper/paper_figure_notes.json

File Overview

src/human_things/

Small package namespace for shared metadata, paths, labels, and helpers.

File Purpose
metadata.py Model labels, colors, and figure styling constants.
project.py Project name, version, research question, constants.
paths.py Canonical repository paths.
utils.py Small shared utilities used by scripts.
__init__.py Package exports.

scripts/

Numbered runnable pipeline entrypoints. These are the main way to reproduce the project.

docs/

Long-form methods/results documentation.

assets/

Project branding and README assets.

data/

Processed and local data. Raw THINGS files are expected locally and are not normally committed.

outputs/

Model checkpoints, embeddings, reports, and figures. Large .pt and .npy files should be handled with Git LFS if tracked.

Reproducibility Notes

  • Seed used throughout the main scripts: 7.
  • Human similarity is concept-level supervision.
  • THINGSplus variables are reserved for evaluation and are not used to train the human-informed losses.
  • Human-pair Spearman is a within-source alignment diagnostic, not a fully independent semantic benchmark.
  • The batch-prototype triplet run in the current results is CPU-capped at 1,200 batches.
  • CPU training is possible but slow. GPU training is recommended for new full runs.

Git and Large Files

This project can produce large outputs:

  • .pt model checkpoints
  • .npy embedding arrays
  • extracted image archives

.gitattributes is configured for Git LFS tracking of .pt and .npy files. Raw data archives should usually stay local.

Documentation

Main detailed write-up:

docs/METHODS_AND_RESULTS.md

Source papers and context PDFs are kept under:

paper_context/

License

This project is licensed under the GNU General Public License, Version 3 (GPL-3.0).

Third-party datasets, source papers, and externally supplied assets remain subject to their original terms and are not relicensed by this repository.

Citation / Acknowledgements

This project builds on the THINGS image database, THINGS human similarity work, and THINGSplus annotations.

About

Human Things explores whether human similarity judgments can improve machine vision embeddings by fine-tuning ResNet-50 models on THINGS object images and human odd-one-out similarity data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages