OORL.PolicyLearningFramework (object v0.1.2)
Individual policy learning with social awareness based on AAOS interaction dyads
Summary
Functions
Learns from interaction dyad experiences.
Performs selective imitation learning from high-performing peers.
Updates an object's policy based on experiences and social context.
Functions
Learns from interaction dyad experiences.
Processes learning specifically from dyadic interactions, which often provide higher-quality learning signals due to sustained cooperation.
Parameters
object_id
- ID of the learning objectdyad_experiences
- Experiences from interaction dyads
Returns
- Aggregated learning updates from all active dyads
@spec update_policy(Object.object_id(), [OORL.experience()], OORL.social_context()) :: {:ok, %{ parameter_deltas: map(), learning_rate_adjustment: float(), exploration_modification: atom() }} | {:error, atom()}
Updates an object's policy based on experiences and social context.
Performs multi-objective policy gradient updates with social regularization and interaction dyad awareness. This function integrates individual learning with social learning signals to improve policy performance.
Parameters
object_id
- ID of the object updating its policyexperiences
- List of recent experiences to learn from:- Each experience contains state, action, reward, next_state
- Experiences are weighted by interaction dyad strength
- Recent experiences have higher learning weight
social_context
- Social learning context with peer information:- Peer rewards for imitation learning
- Observed actions for behavioral copying
- Interaction dyad information for weighting
Returns
{:ok, policy_updates}
- Successful policy updates containing::parameter_deltas
- Changes to policy parameters:learning_rate_adjustment
- Adaptive learning rate modification:exploration_modification
- Exploration strategy updates
{:error, reason}
- Update failed due to::insufficient_data
- Not enough experiences for reliable update:invalid_experiences
- Malformed experience data:ai_reasoning_failed
- AI enhancement failed, using fallback
Learning Algorithm
The policy update process:
- Experience Weighting: Weight experiences by dyad strength
- AI Enhancement: Use AI reasoning for optimization (if available)
- Fallback Learning: Traditional gradient methods if AI fails
- Social Regularization: Incorporate peer behavior signals
- Parameter Updates: Apply computed parameter changes
AI-Enhanced Learning
When AI reasoning is available, the system:
- Analyzes experience patterns for optimal learning
- Considers social compatibility and interaction dynamics
- Optimizes for multiple objectives simultaneously
- Provides interpretable learning recommendations
Examples
# Update policy with experiences and social context
iex> experiences = [
...> %{state: %{x: 0}, action: :right, reward: 1.0, next_state: %{x: 1}},
...> %{state: %{x: 1}, action: :up, reward: 0.5, next_state: %{x: 1, y: 1}}
...> ]
iex> social_context = %{
...> peer_rewards: [{"agent_2", 0.8}],
...> interaction_dyads: ["dyad_1"]
...> }
iex> {:ok, updates} = OORL.PolicyLearning.update_policy(
...> "agent_1", experiences, social_context
...> )
iex> updates.learning_rate_adjustment
1.05
Social Learning Integration
Social context enhances learning through:
- Peer Imitation: Higher-performing peers influence policy updates
- Dyad Weighting: Stronger dyads provide more learning signal
- Behavioral Alignment: Policy updates consider social coordination
Performance Characteristics
- Update time: 2-15ms depending on experience count and AI usage
- Convergence: Typically 20-50% faster with social learning
- Stability: Social regularization improves learning stability
- Scalability: Linear with number of experiences and peer count
social_imitation_learning(object_id, peer_policies, performance_rankings)
Performs selective imitation learning from high-performing peers.
Analyzes peer performance and compatibility to selectively imitate successful behaviors while maintaining object individuality. This prevents naive copying and ensures beneficial social learning.
Parameters
object_id
- ID of the learning objectpeer_policies
- Map of peer object IDs to their policy specificationsperformance_rankings
- List of {peer_id, performance_score} tuples sorted by performance (highest first)Returns
Map of peer IDs to imitation weights (0.0-1.0) where:
Selection Criteria
Peers are selected for imitation based on:
Performance Threshold
Compatibility Assessment
Interaction Dyad Strength
Examples
Imitation Weight Calculation
The weight for each peer is computed as:
Where:
compatibility
∈ [0.0, 1.0] based on behavioral similarityperformance
∈ [0.0, 1.0] normalized performance scoredyad_strength
∈ [0.0, 1.0] interaction dyad effectivenessCompatibility Factors
Compatibility assessment includes:
Benefits of Selective Imitation
Safeguards