Refactor review_analysis + add 3-system and human-vs-AI overlap analyses#91
Draft
dangng2004 wants to merge 1 commit into
Draft
Refactor review_analysis + add 3-system and human-vs-AI overlap analyses#91dangng2004 wants to merge 1 commit into
dangng2004 wants to merge 1 commit into
Conversation
* utils.py — extract shared load/para_set/regions/venn helpers; move plots under plots/ in both PNG and PDF * analysis.py, analysis_gpt_claude.py — refactor to use utils; add docstrings; drop the old top-level venn PNGs (regenerated to plots/) * analysis_three_systems.py — 3-way paragraph-index overlap of coarse / OpenAIReview / Reviewer 3 on their common 70-paper cohort * analysis_with_humans.py — overlap between human OpenReview reviewers and the AI-system union; two-pass LLM concern-extraction + paragraph mapping with on-disk .cache/ * cluster_new.py — KMeans clustering for the two new comparisons * .gitignore — cover plots/, .cache/, generated cluster/per-paper JSONs, and the local frontier_subset_progressive symlink * _combine_gpt_claude.py — compute combined (GPT-5.5 OR Claude-Opus-4.7) recall on the 24-paper frontier subset for tab:recall-overall
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors the review_analysis venn/cluster plumbing into a shared helper module, then adds two new comparison axes on top of it.
Refactor
utils.py— sharedload/para_set/regions_{2,3}/draw_venn{2,3}/save_fig; plots now written toplots/in both PNG and PDFanalysis.py,analysis_gpt_claude.py— refactored to useutils; old top-level venn PNGs deleted (regenerated underplots/)New comparisons
analysis_three_systems.py— 3-way paragraph-index overlap of coarse / OpenAIReview / Reviewer 3 on their common ~70-paper cohortanalysis_with_humans.py— overlap between human OpenReview reviewers and the AI-system union; two-pass LLM concern-extraction + paragraph-mapping with on-disk.cache/cluster_new.py— KMeans clustering for the two new comparisonsOther
.gitignore— coverplots/,.cache/, generatedcluster_*.json/per_paper_*.json, and the localfrontier_subset_progressivesymlinkbenchmarks/perturbation/_combine_gpt_claude.py— paper-table helper: combined (GPT-5.5 OR Claude-Opus-4.7) recall on the 24-paper frontier subset fortab:recall-overallinperturbation.texTest plan
python analysis.pyproducesplots/venn_cp.{png,pdf}andplots/venn_all.{png,pdf}without errorspython analysis_three_systems.pyproduces the 3-way overlap plotpython analysis_with_humans.pyproduces the human-vs-AI venn (uses cached LLM outputs on rerun)python _combine_gpt_claude.pyfrombenchmarks/perturbation/prints combined recall numbers matching the paper table🤖 Generated with Claude Code