Created `GRPOTrainerWithEval` subclass for different evaluation reward functions by jamesbraza · Pull Request #8 · Future-House/trl

jamesbraza · 2025-03-09T01:21:25Z

This PR creates a GRPOTrainer subclass GRPOTrainerWithEval that adds support for optional eval_reward_processing_classes.

It should be backwards compatible with GRPOTrainer.

The only caveat here is I didn't comprehensively think about args.reward_weights.

Copilot

PR Overview

This PR introduces a subclass GRPOTrainerWithEval that extends GRPOTrainer to support evaluation reward functions and their associated processing classes while maintaining backward compatibility.

Refactors model initialization to use an instance attribute (_model_init_kwargs) for improved consistency.
Extracts reward processing class creation into a new helper method (_make_reward_processing_classes).
Implements GRPOTrainerWithEval for handling separate training and evaluation reward functions and processing.

Reviewed Changes

File	Description
trl/trainer/grpo_trainer.py	Added logging, refactored model initialization kwargs, created a helper for reward processing classes, and introduced a new subclass for evaluation reward support

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

trl/trainer/grpo_trainer.py:1257

[nitpick] Consider handling cases where reward_func may not have a name attribute (e.g., when using lambdas or partial functions) to avoid potential AttributeErrors. A possible solution is to use getattr(reward_func, 'name', repr(reward_func)).

reward_func_name = reward_func.__name__

jamesbraza · 2025-03-10T20:33:38Z

Closed in favor of #9 after a rebase onto working-grpo-2025-03-10

jamesbraza added 3 commits March 8, 2025 17:02

Added strict flag to zip

3472f85

Decomposed _make_reward_processing_classes method

4fdcc77

Created GRPOTrainerWithEval subclasses for adding eval functions

cab3c7e

jamesbraza added the enhancement New feature or request label Mar 9, 2025

jamesbraza requested review from albertbou92, Copilot, sidnarayanan and whitead March 9, 2025 01:21

jamesbraza self-assigned this Mar 9, 2025

Copilot AI reviewed Mar 9, 2025

View reviewed changes

jamesbraza deleted the branch working-grpo-2025-03-08 March 10, 2025 20:18

jamesbraza closed this Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Created `GRPOTrainerWithEval` subclass for different evaluation reward functions#8

Created `GRPOTrainerWithEval` subclass for different evaluation reward functions#8
jamesbraza wants to merge 3 commits into
working-grpo-2025-03-08from
grpo-with-eval

jamesbraza commented Mar 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

jamesbraza commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jamesbraza commented Mar 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Uh oh!

jamesbraza commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants