Skip to content

[Benchmark] Add support for Ref-L4_test benchmark#1525

Open
rshube wants to merge 1 commit intoopen-compass:mainfrom
rshube:benchmark/add_refl4_test_support
Open

[Benchmark] Add support for Ref-L4_test benchmark#1525
rshube wants to merge 1 commit intoopen-compass:mainfrom
rshube:benchmark/add_refl4_test_support

Conversation

@rshube
Copy link
Copy Markdown

@rshube rshube commented Apr 22, 2026

Summary

This PR adds support for the Ref-L4_test benchmark in VLMEvalKit.

Changes

  • adds a dedicated RefL4Dataset
  • registers Ref-L4_test in the dataset registry
  • keeps Ref-L4_test separate from RefCOCO
  • adds a dedicated Ref-L4 evaluator instead of reusing the plain RefCOCO summary unchanged

Dataset format

Expected TSV columns:

  • index
  • image
  • question
  • answer
  • bbox_x1
  • bbox_y1
  • bbox_x2
  • bbox_y2
  • width
  • height
  • bbox_area
  • bbox_id
  • ori_category_id
  • image_id
  • file_name
  • is_rewrite

Evaluation

The evaluator reports Ref-L4-style grounding metrics, including:

  • annotation-level accuracy at IoU 0.5
  • annotation-level accuracy at IoU 0.75
  • annotation-level accuracy at IoU 0.9
  • annotation-level mAcc over 0.5:0.95
  • size-level metrics
  • class-level average metrics

It also supports bbox predictions in both:

  • xyxy
  • xywh

Notes

  • this PR only adds benchmark support code
  • hosted DATASET_URL / DATASET_MD5 can be filled once the final TSV hosting location is settled (sending email to maintainers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant