[Benchmark] Add support for Ref-L4_test benchmark by rshube · Pull Request #1525 · open-compass/VLMEvalKit

rshube · 2026-04-22T18:34:38Z

Summary

This PR adds support for the Ref-L4_test benchmark in VLMEvalKit.

adds a dedicated RefL4Dataset
registers Ref-L4_test in the dataset registry
keeps Ref-L4_test separate from RefCOCO
adds a dedicated Ref-L4 evaluator instead of reusing the plain RefCOCO summary unchanged

Expected TSV columns:

The evaluator reports Ref-L4-style grounding metrics, including:

It also supports bbox predictions in both:

this PR only adds benchmark support code
hosted DATASET_URL / DATASET_MD5 can be filled once the final TSV hosting location is settled (sending email to maintainers)

[Benchmark] Add support for Ref-L4_test benchmark

33a1845