OORL.PolicyLearning (object v0.1.2)
Policy learning implementation for OORL framework with social and collective learning.
Provides policy optimization algorithms including:
- Individual policy gradient methods
- Social learning with peer influence
- Collective policy optimization
- Meta-learning for strategy adaptation
Summary
Functions
Performs collective policy optimization across multiple objects.
Evaluates policy performance on a set of test scenarios.
Processes learning from interaction dyad experiences.
Creates a new policy learning configuration.
Selects action based on current policy and exploration strategy.
Performs social imitation learning by learning from peer policies.
Updates policy parameters using gradient-based optimization.
Types
@type policy_type() :: :neural | :tabular | :linear | :tree_based
@type t() :: %OORL.PolicyLearning{ collective_optimization: boolean() | nil, experience_buffer: list() | nil, exploration_strategy: atom() | nil, learning_rate: float() | nil, meta_learning_config: map() | nil, peer_policies: map() | nil, performance_history: list() | nil, policy_network: map() | nil, policy_type: policy_type() | nil, social_learning_enabled: boolean() | nil }
Functions
Performs collective policy optimization across multiple objects.
Parameters
object_policies
: Map of object_id -> policy_learnercollective_experiences
: Shared experiences across objectsoptimization_config
: Collective optimization settings
Returns
Updated map of object policies with collective improvements
Evaluates policy performance on a set of test scenarios.
Parameters
policy_learner
: Policy to evaluatetest_scenarios
: List of test state-action sequencesevaluation_metrics
: Metrics to compute
Returns
{:ok, evaluation_results}
with performance metrics
Processes learning from interaction dyad experiences.
Parameters
object_id
: ID of the learning objectdyad_experiences
: List of dyadic interaction experiences
Returns
Learning updates based on dyadic interactions
Creates a new policy learning configuration.
Parameters
opts
: Configuration options including policy type, learning rate, social learning settings
Returns
%OORL.PolicyLearning{}
struct
Selects action based on current policy and exploration strategy.
Parameters
policy_learner
: Current policy learning statestate
: Current environment stateexploration_config
: Exploration parameters
Returns
{:ok, action}
with selected action
Updates policy parameters using gradient-based optimization.
Parameters
policy_learner
: Current policy learning stateexperiences
: List of experience tuples (state, action, reward, next_state)options
: Update options including batch size, social influence
Returns
Updated %OORL.PolicyLearning{}
struct with improved policy
social_imitation_learning(object_id, peer_policies, performance_rankings)
Performs social imitation learning by learning from peer policies.
Parameters
object_id
: ID of the learning objectpeer_policies
: Map of peer IDs to their policy configurationsperformance_rankings
: List of {peer_id, performance_score} tuplesReturns
Imitation weights indicating influence of each peer policy