Skip to content

Update benchmark#187

Merged
dilithjay merged 4 commits into
mainfrom
dj/bench
Jun 4, 2026
Merged

Update benchmark#187
dilithjay merged 4 commits into
mainfrom
dj/bench

Conversation

@dilithjay

@dilithjay dilithjay commented Jun 1, 2026

Copy link
Copy Markdown
Contributor
  • Update benchmark with latest Gemini and GPT models
  • Set default LLM model to gemini-3.5-flash
  • Update AUTO mode results with new default LLM (gemini-3.5-flash)
  • Add Claude models to benchmark

Comment thread tests/benchmark.py Dismissed
@dilithjay dilithjay requested a review from pramitchoudhary June 3, 2026 22:51
@pramitchoudhary pramitchoudhary added the enhancement New feature or request label Jun 3, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Lexoid’s published benchmark artifacts and runtime defaults to reflect newer LLM offerings (Gemini/GPT/Claude), while extending the benchmark harness to support resumable runs and optional per-parse process isolation.

Changes:

  • Refresh benchmark datasets/docs (CSV, README, Sphinx docs) with newer model results and added Claude models.
  • Switch the library default LLM to gemini-3.5-flash and adjust Gemini/Claude request behavior accordingly.
  • Enhance tests/benchmark.py with progress caching, token-usage persistence, and optional subprocess-isolated parsing.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/results.csv Updates benchmark output CSV with newer model runs/results.
tests/benchmark.py Adds parse isolation mode, caching of per-file/per-config results, and token usage capture.
tests/api_cost_mapping.json Extends pricing map for newer Gemini/GPT/Claude models.
README.md Updates benchmark leaderboard table to match new results.
lexoid/core/utils.py Changes default LLM to gemini-3.5-flash.
lexoid/core/parse_type/llm_parser.py Tweaks Gemini thinking config and adjusts Claude request parameters.
lexoid/core/conversion_utils.py Preserves actual image MIME type in generated data URLs.
lexoid/api.py Hardens bbox detection when segments may be missing/empty.
examples/example_notebook.ipynb Updates example usage to gemini-3.5-flash.
docs/benchmark.rst Updates benchmark documentation table with refreshed rankings/results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/benchmark.py
Comment thread tests/benchmark.py
Comment thread tests/benchmark.py
@dilithjay dilithjay merged commit 1a4ed36 into main Jun 4, 2026
5 checks passed
@dilithjay dilithjay deleted the dj/bench branch June 4, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants