brain_learner (macula_tweann v0.18.1)

View Source

Brain learner process for weight adaptation via plasticity.

This GenServer manages the learning aspects of a brain system: - Applies plasticity rules to update weights based on neural activity - Maintains an experience buffer for batch learning - Handles reward signals for reinforcement-style learning

Online Learning

When online learning is enabled, the learner receives activation data after each inference and applies plasticity rules:

Inference → Activations → Learner → Weight Updates → Back to Inference

Batch Learning

For delayed rewards (e.g., end of game), the learner buffers experiences and applies learning when a reward is received:

1. Record experiences during episode 2. Receive final reward 3. Apply learning with reward propagation (eligibility traces)

Theory

This module implements reward-modulated Hebbian learning, where weight changes depend on: - Pre-synaptic activity (input to connection) - Post-synaptic activity (output from connection) - Global reward signal (from environment)

The basic rule: delta_w = learning_rate * pre * post * reward

For delayed rewards, eligibility traces track which synapses were recently active, allowing credit assignment across time.

See also: plasticity, plasticity_modulated.

Summary

Functions

Clear the experience buffer.

Disable learning.

Enable learning.

Check if automatic experience recording is enabled.

Get the number of buffered experiences.

Get the current learning rate.

Get the current plasticity rule.

Get accumulated weight deltas from last learning step.

Check if learning is enabled.

Learn from buffered experiences using current reward.

Learn from buffered experiences with a specific reward.

Record an experience for batch learning.

Provide a reward signal.

Enable or disable automatic experience recording.

Set the baseline reward for comparison.

Set the learning rate.

Set the plasticity rule.

Start a brain learner process.

Stop the learner process.

Types

experience/0

-type experience() ::
          #{inputs := [float()],
            activations := [[float()]],
            outputs := [float()],
            timestamp := integer()}.

Functions

clear_experience(Pid)

-spec clear_experience(pid()) -> ok.

Clear the experience buffer.

disable(Pid)

-spec disable(pid()) -> ok.

Disable learning.

enable(Pid)

-spec enable(pid()) -> ok.

Enable learning.

get_auto_record(Pid)

-spec get_auto_record(pid()) -> boolean().

Check if automatic experience recording is enabled.

get_experience_count(Pid)

-spec get_experience_count(pid()) -> non_neg_integer().

Get the number of buffered experiences.

get_learning_rate(Pid)

-spec get_learning_rate(pid()) -> float().

Get the current learning rate.

get_plasticity_rule(Pid)

-spec get_plasticity_rule(pid()) -> atom().

Get the current plasticity rule.

get_weight_deltas(Pid)

-spec get_weight_deltas(pid()) -> [float()].

Get accumulated weight deltas from last learning step.

Useful for debugging and visualization.

handle_call(Request, From, State)

handle_cast(Msg, State)

handle_info(Info, State)

init(Opts)

is_enabled(Pid)

-spec is_enabled(pid()) -> boolean().

Check if learning is enabled.

learn_from_experience(Pid)

-spec learn_from_experience(pid()) -> {ok, non_neg_integer()}.

Learn from buffered experiences using current reward.

learn_from_experience(Pid, FinalReward)

-spec learn_from_experience(pid(), float()) -> {ok, non_neg_integer()}.

Learn from buffered experiences with a specific reward.

record_experience(Pid, Inputs, Activations)

-spec record_experience(pid(), [float()], [[float()]]) -> ok.

Record an experience for batch learning.

reward(Pid, Reward)

-spec reward(pid(), float()) -> ok.

Provide a reward signal.

For online learning, this affects the next weight update. Positive rewards strengthen active connections, negative weaken them.

set_auto_record(Pid, Enabled)

-spec set_auto_record(pid(), boolean()) -> ok.

Enable or disable automatic experience recording.

When enabled, the learner automatically records experiences from 'evaluated' events published by the brain via pubsub.

set_baseline_reward(Pid, Baseline)

-spec set_baseline_reward(pid(), float()) -> ok.

Set the baseline reward for comparison.

Effective reward = actual_reward - baseline_reward. This helps with reward normalization.

set_learning_rate(Pid, Rate)

-spec set_learning_rate(pid(), float()) -> ok.

Set the learning rate.

set_plasticity_rule(Pid, Rule)

-spec set_plasticity_rule(pid(), atom()) -> ok.

Set the plasticity rule.

Available rules: none, hebbian, modulated

start_link(Opts)

-spec start_link(map()) -> {ok, pid()} | {error, term()}.

Start a brain learner process.

Options: - inference_pid - PID of the brain inference process (required for weight updates) - enabled - Whether learning is enabled (default: true) - plasticity_rule - Atom identifying the rule (default: modulated) - learning_rate - Learning rate (default: 0.01) - baseline_reward - Baseline to subtract from rewards (default: 0.0) - max_buffer_size - Max experiences to buffer (default: 1000)

stop(Pid)

-spec stop(pid()) -> ok.

Stop the learner process.

terminate(Reason, State)