Created GRPOTrainerWithEval subclass for different evaluation reward functions#9
Created GRPOTrainerWithEval subclass for different evaluation reward functions#9jamesbraza wants to merge 3 commits into
GRPOTrainerWithEval subclass for different evaluation reward functions#9Conversation
There was a problem hiding this comment.
PR Overview
This PR introduces a new subclass, GRPOTrainerWithEval, which extends the GRPOTrainer functionality to support evaluation reward functions while maintaining backward compatibility.
- New GRPOTrainerWithEval subclass accepts separate evaluation reward functions and processing classes.
- Configuration handling is unified through the use of an instance attribute (_model_init_kwargs) and a dedicated helper method (_make_reward_processing_classes).
- The diff adds strict checking in zip calls to enforce matching lengths of reward functions and processing classes.
Reviewed Changes
| File | Description |
|---|---|
| trl/trainer/grpo_trainer.py | Introduces GRPOTrainerWithEval and refactors reward processing and model init kwargs |
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
bd32f49 to
646cd14
Compare
646cd14 to
d178296
Compare
8a73624 to
d9c185a
Compare
d178296 to
05bedba
Compare
05bedba to
3b1a796
Compare
|
Have you tested this against the new multi-task reward_func setup? |
Hi @shirinyamani thanks for the comment, no we stopped rebasing atop Perhaps if we rebased for newer features in |
This PR creates a
GRPOTrainersubclassGRPOTrainerWithEvalthat adds support for optionaleval_reward_processing_classes.It should be backwards compatible with
GRPOTrainer.The only caveat here is I didn't comprehensively think about
args.reward_weights.