OORL.PolicyLearning (object v0.1.2)
Policy learning implementation for OORL framework with social and collective learning.
Provides policy optimization algorithms including:
- Individual policy gradient methods
- Social learning with peer influence
- Collective policy optimization
- Meta-learning for strategy adaptation
Summary
Functions
Performs collective policy optimization across multiple objects.
Evaluates policy performance on a set of test scenarios.
Processes learning from interaction dyad experiences.
Creates a new policy learning configuration.
Selects action based on current policy and exploration strategy.
Performs social imitation learning by learning from peer policies.
Updates policy parameters using gradient-based optimization.
Types
@type policy_type() :: :neural | :tabular | :linear | :tree_based
@type t() :: %OORL.PolicyLearning{ collective_optimization: boolean() | nil, experience_buffer: list() | nil, exploration_strategy: atom() | nil, learning_rate: float() | nil, meta_learning_config: map() | nil, peer_policies: map() | nil, performance_history: list() | nil, policy_network: map() | nil, policy_type: policy_type() | nil, social_learning_enabled: boolean() | nil }
Functions
Performs collective policy optimization across multiple objects.
Parameters
object_policies: Map of object_id -> policy_learnercollective_experiences: Shared experiences across objectsoptimization_config: Collective optimization settings
Returns
Updated map of object policies with collective improvements
Evaluates policy performance on a set of test scenarios.
Parameters
policy_learner: Policy to evaluatetest_scenarios: List of test state-action sequencesevaluation_metrics: Metrics to compute
Returns
{:ok, evaluation_results} with performance metrics
Processes learning from interaction dyad experiences.
Parameters
object_id: ID of the learning objectdyad_experiences: List of dyadic interaction experiences
Returns
Learning updates based on dyadic interactions
Creates a new policy learning configuration.
Parameters
opts: Configuration options including policy type, learning rate, social learning settings
Returns
%OORL.PolicyLearning{} struct
Selects action based on current policy and exploration strategy.
Parameters
policy_learner: Current policy learning statestate: Current environment stateexploration_config: Exploration parameters
Returns
{:ok, action} with selected action
Updates policy parameters using gradient-based optimization.
Parameters
policy_learner: Current policy learning stateexperiences: List of experience tuples (state, action, reward, next_state)options: Update options including batch size, social influence
Returns
Updated %OORL.PolicyLearning{} struct with improved policy
social_imitation_learning(object_id, peer_policies, performance_rankings)
Performs social imitation learning by learning from peer policies.
Parameters
object_id: ID of the learning objectpeer_policies: Map of peer IDs to their policy configurationsperformance_rankings: List of {peer_id, performance_score} tuplesReturns
Imitation weights indicating influence of each peer policy