View Source Rein.Agents.SAC (rein v0.1.0)

Soft Actor-Critic implementation.

This assumes that the Actor network will output {nil, num_actions, 2}, where for each action they output the $\mu$ and $\sigma$ values of a random normal distribution, and that the Critic network accepts "actions" input with shape {nil, num_actions}, where the action is calculated by sampling from said random distribution.

Actions are deemed to be in a continuous space of type :f32.

The Dual Q implementation utilizes two copies of the critic network, critic1 and critic2, each with their own separate target network.

Vectorized axes from :random_key are propagated normally throughout the agent state for parallel simulations, but all samples are stored in the same circular buffer. After all simulations have ran, the optimization steps are run on a sample space consisting of all previous experiences, including all of the parallel simulations that have just finished executing.