Skip to content

Commit e1a2927

Browse files
committed
Remove validation frequency from example.
The tutorial falsely indicated that validation frequency would be evaluation frequency. Timestep preprocessors are evaluated each step. Validation frequency refers to validating the specs.
1 parent 0adab39 commit e1a2927

2 files changed

Lines changed: 3 additions & 9 deletions

File tree

doc/tutorial.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -450,17 +450,14 @@ predefined timestep preprocessors to add a reward.
450450
observation['panda_tcp_pos'])
451451
return np.clip(1.0 - goal_distance, 0, 1)
452452
453-
reward = rewards.ComputeReward(
454-
goal_reward,
455-
validation_frequency=timestep_preprocessor.ValidationFrequency.ALWAYS)
453+
reward = rewards.ComputeReward(goal_reward)
456454
457455
panda_env.add_timestep_preprocessors([reward])
458456
459457
``ComputeReward`` is a timestep preprocessor that computes a reward based on a callable that takes
460458
an observation and returns a scalar which is added to the timestep. The callable ``goal_reward``
461459
computes a reward based on the distance between the robot's end-effector and the ball's pose
462-
observation which we added above. This reward is computed for every timestep. Alternatively rewards
463-
may also be computed only at the end of an epiode.
460+
observation which we added above.
464461

465462

466463
Domain Randomization

examples/rl_environment.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -92,10 +92,7 @@ def goal_reward(observation: spec_utils.ObservationValue):
9292

9393
# ComputeReward is a timestep preprocessor that accepts a callable which computes
9494
# a scalar reward based on the observation and adds it to the timestep.
95-
# We configure the validation frequency so this reward is computed for every timestep.
96-
reward = rewards.ComputeReward(
97-
goal_reward,
98-
validation_frequency=timestep_preprocessor.ValidationFrequency.ALWAYS)
95+
reward = rewards.ComputeReward(goal_reward)
9996

10097
# Instantiate props
10198
ball = Ball()

0 commit comments

Comments
 (0)