OORL.PolicyLearning (object v0.1.2)

Policy learning implementation for OORL framework with social and collective learning.

Provides policy optimization algorithms including:

  • Individual policy gradient methods
  • Social learning with peer influence
  • Collective policy optimization
  • Meta-learning for strategy adaptation

Summary

Functions

Performs collective policy optimization across multiple objects.

Processes learning from interaction dyad experiences.

Creates a new policy learning configuration.

Selects action based on current policy and exploration strategy.

Performs social imitation learning by learning from peer policies.

Updates policy parameters using gradient-based optimization.

Types

learning_config()

@type learning_config() :: %{
  learning_rate: float(),
  batch_size: integer(),
  exploration_rate: float(),
  social_influence: float()
}

policy_type()

@type policy_type() :: :neural | :tabular | :linear | :tree_based

t()

@type t() :: %OORL.PolicyLearning{
  collective_optimization: boolean() | nil,
  experience_buffer: list() | nil,
  exploration_strategy: atom() | nil,
  learning_rate: float() | nil,
  meta_learning_config: map() | nil,
  peer_policies: map() | nil,
  performance_history: list() | nil,
  policy_network: map() | nil,
  policy_type: policy_type() | nil,
  social_learning_enabled: boolean() | nil
}

Functions

collective_policy_optimization(object_policies, collective_experiences, optimization_config \\ %{})

Performs collective policy optimization across multiple objects.

Parameters

  • object_policies: Map of object_id -> policy_learner
  • collective_experiences: Shared experiences across objects
  • optimization_config: Collective optimization settings

Returns

Updated map of object policies with collective improvements

evaluate_policy(policy_learner, test_scenarios, evaluation_metrics \\ [:return, :success_rate])

Evaluates policy performance on a set of test scenarios.

Parameters

  • policy_learner: Policy to evaluate
  • test_scenarios: List of test state-action sequences
  • evaluation_metrics: Metrics to compute

Returns

{:ok, evaluation_results} with performance metrics

interaction_dyad_learning(object_id, dyad_experiences)

Processes learning from interaction dyad experiences.

Parameters

  • object_id: ID of the learning object
  • dyad_experiences: List of dyadic interaction experiences

Returns

Learning updates based on dyadic interactions

new(opts \\ [])

Creates a new policy learning configuration.

Parameters

  • opts: Configuration options including policy type, learning rate, social learning settings

Returns

%OORL.PolicyLearning{} struct

select_action(policy_learner, state, exploration_config \\ %{})

Selects action based on current policy and exploration strategy.

Parameters

  • policy_learner: Current policy learning state
  • state: Current environment state
  • exploration_config: Exploration parameters

Returns

{:ok, action} with selected action

social_imitation_learning(object_id, peer_policies, performance_rankings)

Performs social imitation learning by learning from peer policies.

Parameters

  • object_id: ID of the learning object
  • peer_policies: Map of peer IDs to their policy configurations
  • performance_rankings: List of {peer_id, performance_score} tuples

Returns

Imitation weights indicating influence of each peer policy

update_policy(policy_learner, experiences, options \\ %{})

Updates policy parameters using gradient-based optimization.

Parameters

  • policy_learner: Current policy learning state
  • experiences: List of experience tuples (state, action, reward, next_state)
  • options: Update options including batch size, social influence

Returns

Updated %OORL.PolicyLearning{} struct with improved policy