You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- runner.py: set finished_at before BenchmarkReport snapshot
- metrics.py: token_recall("","") now returns 0.0 with warning to mirror
char_recall and avoid silently inflating aggregate F1
- legalbench_rag.py: precompute keys with isolated random.Random so the
sort no longer mutates global random state
- data_extract_tasks.py: document model_override trust assumption
- local.yml: comment API_KEY as local-only placeholder
- retrieval.py: elevate struct_set→doc cache to module level keyed by
(corpus_id, struct_set_id) so a full benchmark run amortises lookups
- text_alignment.py: hoist doc_text.lower() out of per-query loop
- run_benchmark.py: replace magic 194 with PAPER_MAX_TESTS_PER_BENCHMARK
- constants/benchmarks.py: derive TRIM_LEN from MAX_LEN
- cross_encoder_reranker.py: rename comprehension var to avoid shadowing
- text_chunkers.py: tighten _INVISIBLE_CHARS_RE to format chars only,
preserving en/em dash and thin space
0 commit comments