You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 6, 2021. It is now read-only.
Copy file name to clipboardExpand all lines: src/algorithms/dqns/rainbow.jl
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ See paper: [Rainbow: Combining Improvements in Deep Reinforcement Learning](http
9
9
10
10
- `approximator`::[`AbstractApproximator`](@ref): used to get Q-values of a state.
11
11
- `target_approximator`::[`AbstractApproximator`](@ref): similar to `approximator`, but used to estimate the target (the next state).
12
-
- `loss_func`: the loss function.
12
+
- `loss_func`: the loss function. It is recommended to use Flux.Losses.logitcrossentropy. Flux.Losses.crossentropy will encounter the problem of negative numbers.
13
13
- `Vₘₐₓ::Float32`: the maximum value of distribution.
14
14
- `Vₘᵢₙ::Float32`: the minimum value of distribution.
15
15
- `n_actions::Int`: number of possible actions.
@@ -176,6 +176,7 @@ function RLBase.update!(learner::RainbowLearner, batch::NamedTuple)
176
176
gs =gradient(Flux.params(Q)) do
177
177
logits =reshape(Q(states), n_atoms, n_actions, :)
178
178
select_logits = logits[:, actions]
179
+
# The original paper normalized logits, but using normalization and Flux.Losses.crossentropy is not as stable as using Flux.Losses.logitcrossentropy.
0 commit comments