Skip to content

Question on the soft q learning implementation #143

@YuxuanSong

Description

@YuxuanSong

Hi Haarnoja,

Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At

log_probs = svgd_target_values + squash_correction

,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$))
However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.

I think there should be actions = self._policy.raw_actions(expanded_observations) in

actions = self._policy.actions(expanded_observations)
. (the policy class could add this property.)

Best,
Yuxuan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions