Skip to content

BFCL 评测结果虚高? #46

@yihaohu0118

Description

@yihaohu0118

step 0测了一下multiturn
(TaskRunner pid=693237) ("Initial validation metrics: {'val-core/multi_turn_miss_param/reward/mean@1': "
(TaskRunner pid=693237) "0.19333333333333333, 'val-core/multi_turn_long_context/reward/mean@1': "
(TaskRunner pid=693237) "0.11333333333333333, 'val-core/multi_turn_miss_func/reward/mean@1': "
(TaskRunner pid=693237) "0.17333333333333334, 'val-core/multi_turn_base/reward/mean@1': "
(TaskRunner pid=693237) '0.26666666666666666}')
(TaskRunner pid=693237) step:0 - val-core/multi_turn_miss_param/reward/mean@1:0.193 - val-core/multi_turn_long_context/reward/mean@1:0.113 - val-core/multi_turn_miss_func/reward/mean@1:0.173 - val-core/multi_turn_base/reward/mean@1:0.267

感觉没这么高啊

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions