View Source Rein.Agents.SAC (rein v0.1.0)
Soft Actor-Critic implementation.
This assumes that the Actor network will output {nil, num_actions, 2}
,
where for each action they output the $\mu$ and $\sigma$ values of a random
normal distribution, and that the Critic network accepts "actions"
input with
shape {nil, num_actions}
, where the action is calculated by sampling from
said random distribution.
Actions are deemed to be in a continuous space of type :f32
.
The Dual Q implementation utilizes two copies of the critic network, critic1
and critic2
,
each with their own separate target network.
Vectorized axes from :random_key
are propagated normally throughout
the agent state for parallel simulations, but all samples are stored in the same
circular buffer. After all simulations have ran, the optimization steps are run
on a sample space consisting of all previous experiences, including all of the
parallel simulations that have just finished executing.