Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation
Zhenyang Feng, Zihe Wang, Saul Ibaven Bueno, Tomasz Frelek, Advikaa Ramesh, Jingyan Bai, Lemeng Wang, Zanming Huang, Jianyang Gu, Jinsu Yoo, Tai-Yu Pan, Arpita Chowdhury, Michelle Ramirez, Elizabeth G Campolongo, Matthew J Thompson, Christopher G. Lawrence, Sydne Record, Neil Rosser, Anuj Karpatne, Daniel Rubenstein, Hilmar Lapp, Charles V. Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
- Release inference code
- Release beetle part segmentation dataset
- Release online demo
- Release Open-Close Cycle Consistency Loss (OC-CCL) fine-tuning code
- Release trait retrieval code
- Release butterfly trait segmentation dataset
Set CUDA_HOME to your cuda path (this is for grounding DINO)
For example:
export CUDA_HOME=/usr/local/cuda
Then sync uv packages:
uv sync
Download weights into checkpoints folder:
For wget
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
For curl:
cd checkpoints
curl https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt --output sam2_hiera_large.pt
curl https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth --output groundingdino_swint_ogc.pth
Go to the SAM demo, upload a representative image (e.g., img001.png), click the portions to segment, and select "Cut out object" from the sidebar. Right click and save the extraction (img001_extracted.png).
See the two examples1 below:
img001.png |
img001_extracted.png |
|---|---|
![]() |
![]() |
Then run the following two commands to generate the mask (like a guide for the model in segmentation shape--note the final processed image will appear to be an all black image):
uv run python src/sst/get_mask_from_crop.py \
--image_path img001.png \
--image_crop_path img001_extracted.png \
--mask_image_path_out img001_extracted_processed.png
Example output:
img001_extracted_processed.png |
|---|
![]() |
uv run python src/sst/prepare_starter_mask.py \
--mask_image_path img001_extracted_processed.png \
--mask_image_path_out img001_extracted_processed.png
Example output (NOTE: the color is very faint):
img001_extracted_processed.png |
|---|
![]() |
Now that the mask has been generated, the following command can be run to segment your remaining images.
uv run python src/sst/segment_and_crop.py \
--support_image img001.png \
--support_mask img001_extracted_processed.png \
--query_images [PATH_TO_IMAGE_DIRECTORY] \
--output [PATH_TO_SEGMENTED_OUTPUT_DIRECTORY]
The above script is RAM intensive on large datasets. To process individually run the above with src/sst/segment_and_crop_individual.py
For one-shot trait/part segmentation, please run the following demo code:
python src/sst/segment.py --support_image /path/to/sample/image.png \
--support_mask /path/to/greyscale_mask.png \
--query_images /path/to/query/images/folder \
--output /path/to/output/folder \
--output_format "png" # png or gif, optionalOC-CCL (Open-Close Cycle Consistency Loss) fine-tunes SAM2 on a target species. The cycle opens with reference โ query (predict the query mask) and closes with query โ reference (predict the closing mask back on the reference), supervised against the reference's GT mask with BCE + Dice.
1. Get the butterfly images. Mask annotations are already tracked under data/cambridge_butterfly/DataSet_Butterfly/. The image manifest with Zenodo URLs and md5 checksums is committed at data/cambridge_butterfly/images.csv. Download with cautious-robot:
pip install cautious-robot
cautious-robot -i data/cambridge_butterfly/images.csv \
-o data/cambridge_butterfly/images \
--checksum-algorithm md5 --verifier-col md5Images land at data/cambridge_butterfly/images/<image_id>.<ext>. cautious-robot skips existing files, retries 429/5xx responses, and verifies every download against the committed md5. The manifest can be regenerated from the per-species train_test_separate/*.json files via python data/cambridge_butterfly/build_download_csv.py (queries the Zenodo API for fresh checksums).
2. Train on one or more species.
python src/sst/oc_ccl.py \
--checkpoint checkpoints/sam2_hiera_large.pt \
--species "(malleti x plesseni) x malleti" \
--epochs 10 --lr 1e-5 \
--output_dir outputs/oc_cclBest checkpoint is written to <output_dir>/best_model.pt. Defaults: --lr 1e-5, --batch_size 1, --epochs 10.
3. Reproduce the ablation grid. 16 runs across 8 GPUs sweeping learning rate, BCE/Dice weighting, LoRA rank, and memory reset:
bash experiments/launch_ablations.sh
python experiments/eval_all_ablations.py # writes outputs/ablation/eval_results.json4. Curriculum variant (top-n% by reconstruction quality). Precomputes per-sample cycle reconstruction IoU, then trains only on the highest-quality fraction:
python experiments/curriculum_oc_ccl.py --gpu 0 --epochs 10 --lr 1e-6For trait-based retrieval, please refer to the demo code below:
python src/sst/trait_retrieval.py --support_image /path/to/sample/image.png \
--support_mask /path/to/greyscale_mask.png \
--trait_id 1 \ # target trait to retrieve, denote by the value in support mask \
--query_images /path/to/query/images/folder \
--output /path/to/output/folder \
--output_format "png" \ # png or gif, optional
--top_k 5 # n top retrievals to save as resultsIn the gui/ directory, there is a low-code option for users. Follow the directions in that README to install and run the interface.
Beetle part segmentation dataset is available here.
Butterfly trait segmentation dataset can be accessed here.
The instructions and appropriate citations for these datasets are provided in the Citation section of their respective READMEs.
This project makes use of the SAM2 and GroundingDINO codebases. We are grateful to the developers and maintainers of these projects for their contributions to the open-source community. We thank LoRA for their great work.
We also thank David Carlyn for his contributions to improving the repositoryโs ease of setup, workflows, and overall usability; and Sam Stevens for developing a nice interactive tool for mask generation, selection, and visualization.
If you find our work helpful for your research, please consider citing using the following BibTeX entry:
@misc{feng2025staticsegmentationtrackingfrustratingly,
title={Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation},
author={Zhenyang Feng and Zihe Wang and Saul Ibaven Bueno and Tomasz Frelek and Advikaa Ramesh and Jingyan Bai and Lemeng Wang and Zanming Huang and Jianyang Gu and Jinsu Yoo and Tai-Yu Pan and Arpita Chowdhury and Michelle Ramirez and Elizabeth G. Campolongo and Matthew J. Thompson and Christopher G. Lawrence and Sydne Record and Neil Rosser and Anuj Karpatne and Daniel Rubenstein and Hilmar Lapp and Charles V. Stewart and Tanya Berger-Wolf and Yu Su and Wei-Lun Chao},
year={2025},
eprint={2501.06749},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.06749},
}Footnotes
-
Example images are from Santos, S. C. P. (2025). Wasp-Moth Mimicry. Hugging Face. https://huggingface.co/datasets/Sol-Carolina/Wasp_moth_mimicry. โฉ




