Skip to content

Refactor review_analysis + add 3-system and human-vs-AI overlap analyses#91

Draft
dangng2004 wants to merge 1 commit into
mainfrom
feat/venn-analyses
Draft

Refactor review_analysis + add 3-system and human-vs-AI overlap analyses#91
dangng2004 wants to merge 1 commit into
mainfrom
feat/venn-analyses

Conversation

@dangng2004
Copy link
Copy Markdown
Contributor

Summary

Refactors the review_analysis venn/cluster plumbing into a shared helper module, then adds two new comparison axes on top of it.

Refactor

  • utils.py — shared load / para_set / regions_{2,3} / draw_venn{2,3} / save_fig; plots now written to plots/ in both PNG and PDF
  • analysis.py, analysis_gpt_claude.py — refactored to use utils; old top-level venn PNGs deleted (regenerated under plots/)

New comparisons

  • analysis_three_systems.py — 3-way paragraph-index overlap of coarse / OpenAIReview / Reviewer 3 on their common ~70-paper cohort
  • analysis_with_humans.py — overlap between human OpenReview reviewers and the AI-system union; two-pass LLM concern-extraction + paragraph-mapping with on-disk .cache/
  • cluster_new.py — KMeans clustering for the two new comparisons

Other

  • .gitignore — cover plots/, .cache/, generated cluster_*.json / per_paper_*.json, and the local frontier_subset_progressive symlink
  • benchmarks/perturbation/_combine_gpt_claude.py — paper-table helper: combined (GPT-5.5 OR Claude-Opus-4.7) recall on the 24-paper frontier subset for tab:recall-overall in perturbation.tex

Test plan

  • python analysis.py produces plots/venn_cp.{png,pdf} and plots/venn_all.{png,pdf} without errors
  • python analysis_three_systems.py produces the 3-way overlap plot
  • python analysis_with_humans.py produces the human-vs-AI venn (uses cached LLM outputs on rerun)
  • python _combine_gpt_claude.py from benchmarks/perturbation/ prints combined recall numbers matching the paper table

🤖 Generated with Claude Code

* utils.py — extract shared load/para_set/regions/venn helpers; move plots
  under plots/ in both PNG and PDF
* analysis.py, analysis_gpt_claude.py — refactor to use utils; add docstrings;
  drop the old top-level venn PNGs (regenerated to plots/)
* analysis_three_systems.py — 3-way paragraph-index overlap of coarse /
  OpenAIReview / Reviewer 3 on their common 70-paper cohort
* analysis_with_humans.py — overlap between human OpenReview reviewers and
  the AI-system union; two-pass LLM concern-extraction + paragraph mapping
  with on-disk .cache/
* cluster_new.py — KMeans clustering for the two new comparisons
* .gitignore — cover plots/, .cache/, generated cluster/per-paper JSONs,
  and the local frontier_subset_progressive symlink
* _combine_gpt_claude.py — compute combined (GPT-5.5 OR Claude-Opus-4.7)
  recall on the 24-paper frontier subset for tab:recall-overall
@dangng2004 dangng2004 marked this pull request as draft May 21, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant