gyx v0.1.25 Gyx.Agents.SARSA

This agent implements SARSA, it takes into account the current state, action, reward (s_t, a_t, r_t) and on policy estimates for the best next action a_t+1 and state s_t+1.

The Q update is given by:

$sarsa$

The Q table process must be referenced on struct Q key, which must follow the Gyx.Qstorage behaviour

Link to this section Summary

Types

t()

Functions

act_epsilon_greedy(agent, environment_state)

act_greedy(agent, environment_state)

handle_call(arg, from, state)

init(process_q)

start_link(opts)

start_link(process_q, opts)

td_learn(agent, sarsa)

Link to this section Types

t()

t() :: %Gyx.Agents.SARSA{
  Q: any(),
  epsilon: float(),
  epsilon_min: float(),
  gamma: float(),
  learning_rate: float()
}

Link to this section Functions

act_epsilon_greedy(agent, environment_state)

act_greedy(agent, environment_state)

handle_call(arg, from, state)

init(process_q)

start_link(opts)

start_link(process_q, opts)

gyx

v0.1.25

gyx v0.1.25 Gyx.Agents.SARSA

Link to this section Summary

Types

Functions

Link to this section Types

t() t() :: %Gyx.Agents.SARSA{ Q: any(), epsilon: float(), epsilon_min: float(), gamma: float(), learning_rate: float() }

Link to this section Functions

act_epsilon_greedy(agent, environment_state)

act_greedy(agent, environment_state)

handle_call(arg, from, state)

init(process_q)

start_link(opts)

start_link(process_q, opts)

td_learn(agent, sarsa)

t()

t() :: %Gyx.Agents.SARSA{ Q: any(), epsilon: float(), epsilon_min: float(), gamma: float(), learning_rate: float() }