OORL (object v0.1.2)

Object-Oriented Reinforcement Learning Framework

OORL extends traditional reinforcement learning by treating each learning agent as a full autonomous object with encapsulated state, behavior polymorphism, and sophisticated social learning capabilities. This framework enables complex multi-agent learning scenarios that go far beyond traditional flat RL approaches.

Core Principles

OORL objects exhibit several advanced capabilities:

Behavioral Inheritance: Objects can inherit and override learning strategies from parent classes, enabling sophisticated policy hierarchies
Dynamic Coalition Formation: Objects form temporary alliances for collective learning and problem solving
Reward Function Evolution: Objects evolve their own intrinsic reward functions through meta-learning processes
Multi-Objective Optimization: Objects balance multiple competing goals through hierarchical objective structures
Distributed Policy Learning: Objects share knowledge and learn collectively across object networks through social learning mechanisms

Framework Architecture

Learning Levels

OORL operates at multiple levels of learning:

Individual Learning: Traditional RL with policy and value function updates
Social Learning: Learning from peer objects through observation and imitation
Collective Learning: Distributed optimization across object coalitions
Meta-Learning: Learning to learn - adaptation of learning strategies themselves

Key Components

OORL.PolicyLearning: Individual and social policy learning algorithms
OORL.CollectiveLearning: Coalition formation and distributed optimization
OORL.MetaLearning: Meta-learning and strategy evolution

Performance Characteristics

Learning Speed: 2-5x faster convergence through social learning
Scalability: Linear scaling with number of objects in coalition
Robustness: Graceful degradation with partial coalition failures
Adaptation: Dynamic strategy adjustment based on environment changes

Example Usage

# Initialize OORL learning for an object
{:ok, oorl_state} = OORL.initialize_oorl_object("agent_1", %{
  policy_type: :neural,
  social_learning_enabled: true,
  meta_learning_enabled: true
})

# Perform learning step with social context
social_context = %{
  peer_rewards: [{"agent_2", 0.8}, {"agent_3", 0.6}],
  interaction_dyads: ["dyad_1", "dyad_2"]
}

{:ok, results} = OORL.learning_step(
  "agent_1", current_state, action, reward, next_state, social_context
)

# Form learning coalition
{:ok, coalition} = OORL.CollectiveLearning.form_learning_coalition(
  ["agent_1", "agent_2", "agent_3"],
  %{task_type: :coordination, difficulty: :high}
)

Summary

Types

action_observation()

coalition_id()

dyad_id()

experience()

Learning experience containing state transition and social context.

exploration_spec()

Exploration strategy configuration

goal_id()

Unique goal identifier

goal_spec()

Goal specification with success criteria

goal_tree()

Hierarchical goal structure with priorities

graph()

Social learning graph representing object relationships

learning_strategy()

Learning strategy configuration

message()

meta_state()

Meta-learning state for strategy adaptation

network_spec()

Neural network architecture specification

object_id()

oorl_state()

Complete OORL state for an object with all learning capabilities.

performance_metric()

Performance metric for meta-learning

policy_spec()

Policy specification defining the learning agent's decision-making strategy.

reward_component()

Reward component for multi-objective optimization

reward_spec()

Reward function specification with components

social_context()

Social learning context containing peer information and interaction history.

trigger_condition()

Trigger condition for strategy adaptation

value_spec()

Value function specification and parameters

Functions

initialize_oorl_object(object_id, learning_config \\ %{})

Initializes an OORL object with learning capabilities.

learning_step(object_id, state, action, reward, next_state, social_context)

Performs a single learning step for an OORL object.

Types

action_observation()

@type action_observation() :: %{
  object_id: object_id(),
  action: any(),
  outcome: any(),
  timestamp: DateTime.t()
}

coalition_id()

@type coalition_id() :: String.t()

dyad_id()

@type dyad_id() :: String.t()

experience()

@type experience() :: %{
  state: any(),
  action: any(),
  reward: float(),
  next_state: any(),
  social_context: social_context(),
  meta_features: %{
    state_complexity: float(),
    action_confidence: float(),
    reward_surprise: float(),
    learning_opportunity: float()
  },
  timestamp: DateTime.t(),
  interaction_dyad: dyad_id() | nil,
  learning_signal: float()
}

Learning experience containing state transition and social context.

Fields

state - Environment state before action
action - Action taken by the object
reward - Numerical reward received
next_state - Environment state after action
social_context - Social learning context at time of experience
meta_features - Meta-learning features (complexity, novelty, etc.)
timestamp - When the experience occurred
interaction_dyad - Dyad involved in the experience (if any)
learning_signal - Strength of learning signal for this experience

Learning Integration

Experiences are used for:

Policy gradient updates
Value function learning
Social learning integration
Meta-learning strategy adaptation

exploration_spec()

@type exploration_spec() :: %{
  type: :epsilon_greedy | :ucb | :thompson_sampling | :curiosity_driven,
  parameters: map(),
  adaptation_enabled: boolean(),
  social_influence: float()
}

Exploration strategy configuration

goal_id()

@type goal_id() :: String.t()

Unique goal identifier

goal_spec()

@type goal_spec() :: %{
  id: goal_id(),
  description: String.t(),
  success_threshold: float(),
  priority: float(),
  time_horizon: pos_integer()
}

Goal specification with success criteria

goal_tree()

@type goal_tree() :: %{
  primary_goals: [goal_spec()],
  sub_goals: %{required(goal_id()) => [goal_spec()]},
  goal_weights: %{required(goal_id()) => float()},
  goal_dependencies: %{required(goal_id()) => [goal_id()]}
}

Hierarchical goal structure with priorities

graph()

@type graph() :: %{
  nodes: [object_id()],
  edges: [{object_id(), object_id(), float()}],
  centrality_scores: %{required(object_id()) => float()},
  clustering_coefficient: float()
}

Social learning graph representing object relationships

learning_strategy()

@type learning_strategy() :: %{
  algorithm: :q_learning | :policy_gradient | :actor_critic,
  hyperparameters: map(),
  social_weight: float(),
  exploration_strategy: exploration_spec()
}

Learning strategy configuration

message()

@type message() :: %{
  sender: object_id(),
  content: any(),
  recipients: [object_id()],
  role: :prompt | :response,
  timestamp: DateTime.t(),
  dyad_id: dyad_id() | nil
}

meta_state()

@type meta_state() :: %{
  learning_history: [performance_metric()],
  adaptation_triggers: [trigger_condition()],
  strategy_variants: [learning_strategy()],
  performance_baseline: float()
}

Meta-learning state for strategy adaptation

network_spec()

@type network_spec() :: %{
  layers: [pos_integer()],
  activation: :relu | :tanh | :sigmoid,
  dropout_rate: float(),
  batch_normalization: boolean()
}

Neural network architecture specification

object_id()

@type object_id() :: String.t()

oorl_state()

@type oorl_state() :: %{
  policy_network: policy_spec(),
  value_function: value_spec(),
  experience_buffer: [experience()],
  social_learning_graph: graph(),
  meta_learning_state: meta_state(),
  goal_hierarchy: goal_tree(),
  reward_function: reward_spec(),
  exploration_strategy: exploration_spec()
}

Complete OORL state for an object with all learning capabilities.

Fields

policy_network - Decision-making policy (neural, tabular, or hybrid)
value_function - State value estimation function
experience_buffer - Replay buffer for learning experiences
social_learning_graph - Network of social connections and trust
meta_learning_state - Strategy adaptation and meta-learning
goal_hierarchy - Multi-objective goal structure with priorities
reward_function - Multi-component reward specification
exploration_strategy - Exploration/exploitation strategy

Integration

All components work together to provide:

Individual reinforcement learning
Social learning from peers
Collective learning in coalitions
Meta-learning for strategy adaptation

performance_metric()

@type performance_metric() :: %{
  timestamp: DateTime.t(),
  reward: float(),
  learning_rate: float(),
  convergence_speed: float(),
  social_benefit: float()
}

Performance metric for meta-learning

policy_spec()

@type policy_spec() :: %{
  type: :neural | :tabular | :hybrid | :evolved,
  parameters: %{required(atom()) => any()},
  architecture: network_spec(),
  update_rule: :gradient_ascent | :natural_gradient | :proximal_policy,
  social_influence_weight: float()
}

Policy specification defining the learning agent's decision-making strategy.

Fields

type - Policy representation type
parameters - Policy-specific parameters
architecture - Network structure for neural policies
update_rule - Algorithm for policy updates
social_influence_weight - Weighting for social learning integration

reward_component()

@type reward_component() ::
  :task_reward | :social_reward | :curiosity_reward | :intrinsic_reward

Reward component for multi-objective optimization

reward_spec()

@type reward_spec() :: %{
  components: [reward_component()],
  weights: %{required(atom()) => float()},
  adaptation_rate: float(),
  intrinsic_motivation: float()
}

Reward function specification with components

social_context()

@type social_context() :: %{
  observed_actions: [action_observation()],
  peer_rewards: [{object_id(), float()}],
  coalition_membership: [coalition_id()],
  reputation_scores: %{required(object_id()) => float()},
  interaction_dyads: [dyad_id()],
  message_history: [message()]
}

Social learning context containing peer information and interaction history.

Fields

observed_actions - Actions observed from peer objects with outcomes
peer_rewards - Recent reward signals from peer objects
coalition_membership - List of coalitions this object belongs to
reputation_scores - Trust and reliability scores for peer objects
interaction_dyads - Active interaction dyads with other objects
message_history - Recent communication history for context

Usage in Learning

Social context enables:

Imitation learning from successful peers
Coordination with coalition members
Trust-based learning partner selection
Communication-informed decision making

trigger_condition()

@type trigger_condition() :: %{
  metric: atom(),
  threshold: float(),
  comparison: :greater_than | :less_than | :equal_to,
  window_size: pos_integer()
}

Trigger condition for strategy adaptation

value_spec()

@type value_spec() :: %{
  type: :neural | :tabular | :linear,
  architecture: network_spec(),
  learning_rate: float(),
  discount_factor: float()
}

Value function specification and parameters

Functions

initialize_oorl_object(object_id, learning_config \\ %{})

@spec initialize_oorl_object(Object.object_id(), map()) :: {:ok, oorl_state()}

Initializes an OORL object with learning capabilities.

Sets up a complete OORL learning system for an object including policy networks, value functions, social learning capabilities, and meta-learning features. This is the entry point for enabling advanced learning capabilities on any AAOS object.

Parameters

object_id - Unique identifier for the learning object
learning_config - Configuration options map with the following keys:
- :policy_type - Policy representation (:neural, :tabular, default: :neural)
- :social_learning_enabled - Enable social learning (default: true)
- :meta_learning_enabled - Enable meta-learning (default: true)
- :curiosity_driven - Enable curiosity-driven exploration (default: true)
- :coalition_participation - Allow coalition membership (default: true)
- :learning_rate - Base learning rate (default: 0.01)
- :exploration_rate - Initial exploration rate (default: 0.1)
- :discount_factor - Future reward discount (default: 0.95)

Returns

{:ok, oorl_state} - Successfully initialized OORL state structure

OORL State Structure

The returned state includes:

Policy Network: Decision-making policy (neural or tabular)
Value Function: State value estimation function
Experience Buffer: Replay buffer for learning
Social Learning Graph: Network of social connections
Meta-Learning State: Strategy adaptation mechanisms
Goal Hierarchy: Multi-objective goal structure
Reward Function: Multi-component reward specification
Exploration Strategy: Exploration/exploitation balance

Examples

# Initialize with neural policy
iex> {:ok, state} = OORL.initialize_oorl_object("agent_1", %{
...>   policy_type: :neural,
...>   learning_rate: 0.001,
...>   social_learning_enabled: true
...> })
iex> state.policy_network.type
:neural

# Initialize tabular policy for discrete environments
iex> {:ok, state} = OORL.initialize_oorl_object("discrete_agent", %{
...>   policy_type: :tabular,
...>   exploration_rate: 0.2
...> })
iex> state.policy_network.type
:tabular

# Initialize with meta-learning disabled
iex> {:ok, state} = OORL.initialize_oorl_object("simple_agent", %{
...>   meta_learning_enabled: false,
...>   curiosity_driven: false
...> })
iex> state.exploration_strategy.type
:epsilon_greedy

Configuration Guidelines

Policy Type Selection

Neural: Continuous state/action spaces, complex patterns
Tabular: Discrete spaces, interpretable policies
Hybrid: Mixed discrete/continuous environments

Learning Rates

High (0.1-0.5): Fast changing environments
Medium (0.01-0.1): Typical applications
Low (0.001-0.01): Stable environments, fine-tuning

Enable for multi-agent environments
Disable for single-agent optimization
Consider computational overhead

Performance Impact

Initialization time: ~5-10ms
Memory usage: ~5-50KB depending on configuration
Neural networks: Higher memory, better generalization
Tabular policies: Lower memory, exact solutions

Error Conditions

Initialization may fail due to:

Invalid configuration parameters
Insufficient system resources
Conflicting option combinations

learning_step(object_id, state, action, reward, next_state, social_context)

@spec learning_step(
  Object.object_id(),
  any(),
  any(),
  float(),
  any(),
  social_context()
) ::
  {:ok,
   %{
     policy_update: map(),
     social_updates: map(),
     meta_updates: map(),
     total_learning_signal: float()
   }}
  | {:error, atom()}

Performs a single learning step for an OORL object.

Processes a complete learning experience including individual policy updates, social learning integration, and meta-learning adaptation. This is the core learning function that integrates multiple levels of learning in a single operation.

Parameters

object_id - ID of the learning object (must be OORL-enabled)
state - Current environment state (any serializable term)
action - Action taken by the object
reward - Numerical reward signal received
next_state - Resulting environment state after action
social_context - Social learning context containing:
- :observed_actions - Actions observed from peer objects
- :peer_rewards - Reward signals from peer objects
- :coalition_membership - Active coalition memberships
- :interaction_dyads - Active interaction dyads
- :message_history - Recent communication history

Returns

{:ok, learning_results} - Successful learning with detailed results:
- :policy_update - Individual policy learning results
- :social_updates - Social learning integration results
- :meta_updates - Meta-learning adaptation results
- :total_learning_signal - Aggregate learning signal strength
{:error, reason} - Learning step failed due to:
- :object_not_found - Object not registered
- :invalid_state - State format invalid
- :learning_disabled - OORL not enabled for object
- :resource_exhausted - Insufficient computational resources

Learning Process

Each learning step involves:

Experience Creation: Package (state, action, reward, next_state)
Individual Learning: Update policy using RL algorithm
Social Learning: Integrate peer observations and rewards
Meta-Learning: Adapt learning strategy based on performance
Result Aggregation: Combine learning signals from all levels

Examples

# Basic learning step
iex> social_context = %{
...>   peer_rewards: [{"agent_2", 0.8}],
...>   interaction_dyads: ["dyad_1"]
...> }
iex> {:ok, results} = OORL.learning_step(
...>   "agent_1", 
...>   %{position: {0, 0}}, 
...>   :move_right, 
...>   1.0, 
...>   %{position: {1, 0}},
...>   social_context
...> )
iex> results.total_learning_signal
0.35

# Learning with rich social context
iex> rich_context = %{
...>   observed_actions: [
...>     %{object_id: "agent_2", action: :explore, outcome: :success},
...>     %{object_id: "agent_3", action: :exploit, outcome: :failure}
...>   ],
...>   peer_rewards: [{"agent_2", 1.2}, {"agent_3", -0.5}],
...>   coalition_membership: ["coalition_alpha"],
...>   interaction_dyads: ["dyad_2", "dyad_3"]
...> }
iex> {:ok, results} = OORL.learning_step(
...>   "social_agent", current_state, action, reward, next_state, rich_context
...> )
iex> results.social_updates.peer_influence
0.25

Learning Algorithms

The learning step uses different algorithms based on policy type:

Neural Policies

Policy gradient with social regularization
Experience replay with peer experiences
Neural network parameter updates

Tabular Policies

Q-learning with social Q-value sharing
Direct state-action value updates
Exploration bonus from peer actions

Social learning enhances individual learning through:

Imitation: Copy successful actions from high-performing peers
Advice Taking: Weight peer rewards in policy updates
Coordination: Align actions with coalition objectives
Knowledge Transfer: Share learned policies across similar states

Performance Characteristics

Learning step time: 1-10ms depending on complexity
Memory usage: Temporary allocations for experience processing
Convergence: 2-5x faster with effective social learning
Scalability: Linear with number of peer objects in context

Meta-Learning Adaptation

Meta-learning continuously adapts:

Learning rates based on convergence speed
Exploration strategies based on environment dynamics
Social weights based on peer performance
Reward function components based on goal achievement

OORL (object v0.1.2)

Core Principles

Framework Architecture

Learning Levels

Key Components

Performance Characteristics

Example Usage

Summary

Types

Functions

Types

action_observation()

coalition_id()

dyad_id()

experience()

Fields

Learning Integration

exploration_spec()

goal_id()

goal_spec()

goal_tree()

graph()

learning_strategy()

message()

meta_state()

network_spec()

object_id()

oorl_state()

Fields

Integration

performance_metric()

policy_spec()

Fields

reward_component()

reward_spec()

social_context()

Fields

Usage in Learning

trigger_condition()

value_spec()

Functions

initialize_oorl_object(object_id, learning_config \\ %{})

Parameters

Returns

OORL State Structure

Examples

Configuration Guidelines

Policy Type Selection

Learning Rates

Social Learning

Performance Impact

Error Conditions

learning_step(object_id, state, action, reward, next_state, social_context)

Parameters

Returns

Learning Process

Examples

Learning Algorithms

Neural Policies

Tabular Policies

Social Learning Integration

Performance Characteristics

Meta-Learning Adaptation