Update benchmark by dilithjay · Pull Request #187 · oidlabs-com/Lexoid

dilithjay · 2026-06-01T22:43:08Z

Update benchmark with latest Gemini and GPT models
Set default LLM model to gemini-3.5-flash
Update AUTO mode results with new default LLM (gemini-3.5-flash)
Add Claude models to benchmark

Copilot

Pull request overview

Updates Lexoid’s published benchmark artifacts and runtime defaults to reflect newer LLM offerings (Gemini/GPT/Claude), while extending the benchmark harness to support resumable runs and optional per-parse process isolation.

Changes:

Refresh benchmark datasets/docs (CSV, README, Sphinx docs) with newer model results and added Claude models.
Switch the library default LLM to gemini-3.5-flash and adjust Gemini/Claude request behavior accordingly.
Enhance tests/benchmark.py with progress caching, token-usage persistence, and optional subprocess-isolated parsing.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/results.csv	Updates benchmark output CSV with newer model runs/results.
tests/benchmark.py	Adds parse isolation mode, caching of per-file/per-config results, and token usage capture.
tests/api_cost_mapping.json	Extends pricing map for newer Gemini/GPT/Claude models.
README.md	Updates benchmark leaderboard table to match new results.
lexoid/core/utils.py	Changes default LLM to `gemini-3.5-flash`.
lexoid/core/parse_type/llm_parser.py	Tweaks Gemini thinking config and adjusts Claude request parameters.
lexoid/core/conversion_utils.py	Preserves actual image MIME type in generated data URLs.
lexoid/api.py	Hardens bbox detection when `segments` may be missing/empty.
examples/example_notebook.ipynb	Updates example usage to `gemini-3.5-flash`.
docs/benchmark.rst	Updates benchmark documentation table with refreshed rankings/results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dilithjay added 2 commits June 1, 2026 19:40

Update benchmark

c4463c5

Set default LLM to gemini-3.5-flash

53ea754

github-advanced-security AI found potential problems Jun 1, 2026

View reviewed changes

Comment thread tests/benchmark.py Dismissed

pramitchoudhary assigned dilithjay Jun 2, 2026

dilithjay requested a review from pramitchoudhary June 3, 2026 22:51

pramitchoudhary added the enhancement New feature or request label Jun 3, 2026

dilithjay and others added 2 commits June 4, 2026 12:45

Merge branch 'main' into dj/bench

87f5cf4

Update benchmark with claude models and AUTO mode results

473d14c

dilithjay requested a review from Copilot June 4, 2026 15:48

Copilot started reviewing on behalf of dilithjay June 4, 2026 15:48 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread tests/benchmark.py

Comment thread tests/benchmark.py

Comment thread tests/benchmark.py

dilithjay merged commit 1a4ed36 into main Jun 4, 2026
5 checks passed

dilithjay deleted the dj/bench branch June 4, 2026 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmark#187

Update benchmark#187
dilithjay merged 4 commits into
mainfrom
dj/bench

dilithjay commented Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dilithjay commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dilithjay commented Jun 1, 2026 •

edited

Loading