View Source Rein.Agent behaviour (rein v0.1.0)

The behaviour that should be implemented by a Rein agent module.

Summary

Types

rl_state()

The full state of the current Reinforcement Learning process, as stored in the Rein struct

t()

An arbitrary Nx.Container that holds metadata for the agent

Callbacks

init(random_key, opts)

Initializes the agent state with the given agent-specific options.

optimize_model(rl_state)

record_observation(rl_state, action, reward, is_terminal, next_rl_state)

Can be used to record the observation in an experience replay buffer.

reset(random_key, rl_state)

Resets any values that vary between sessions (which would be episodes for episodic tasks) for the agent state.

select_action(rl_state, iteration)

Selects the action to be taken.

Types

rl_state()

@type rl_state() :: Rein.t()

The full state of the current Reinforcement Learning process, as stored in the Rein struct

t()

@type t() :: Nx.Container.t()

An arbitrary Nx.Container that holds metadata for the agent

Callbacks

init(random_key, opts)

@callback init(random_key :: Nx.t(), opts :: keyword()) :: {t(), random_key :: Nx.t()}

Initializes the agent state with the given agent-specific options.

Should be implemented in a way that the result would be semantically the same as if reset/2 was called in the end of the function.

As a suggestion, the implementation should only initialize fixed values here, that is values that don't change between sessions (epochs for non-episodic tasks, episodes for episodic tasks). Then, call reset/2 internally to initialize the rest of variable values.

optimize_model(rl_state)

@callback optimize_model(rl_state()) :: rl_state()

record_observation(rl_state, action, reward, is_terminal, next_rl_state)

@callback record_observation(
  rl_state(),
  action :: Nx.t(),
  reward :: Nx.t(),
  is_terminal :: Nx.t(),
  next_rl_state :: rl_state()
) :: rl_state()

Can be used to record the observation in an experience replay buffer.

If this is not desired, just make this function return the first argument unchanged.

reset(random_key, rl_state)

@callback reset(random_key :: Nx.t(), rl_state :: t()) :: {t(), random_key :: Nx.t()}

Resets any values that vary between sessions (which would be episodes for episodic tasks) for the agent state.

select_action(rl_state, iteration)

@callback select_action(rl_state(), iteration :: Nx.t()) :: {action :: Nx.t(), rl_state()}

Selects the action to be taken.

Settings View Source Rein.Agent behaviour (rein v0.1.0)

Summary

Types

Callbacks

Types

rl_state()

t()

Callbacks

init(random_key, opts)

optimize_model(rl_state)

record_observation(rl_state, action, reward, is_terminal, next_rl_state)

reset(random_key, rl_state)

select_action(rl_state, iteration)

View Source Rein.Agent behaviour (rein v0.1.0)