OORL (object v0.1.2)

Object-Oriented Reinforcement Learning Framework

OORL extends traditional reinforcement learning by treating each learning agent as a full autonomous object with encapsulated state, behavior polymorphism, and sophisticated social learning capabilities. This framework enables complex multi-agent learning scenarios that go far beyond traditional flat RL approaches.

Core Principles

OORL objects exhibit several advanced capabilities:

  1. Behavioral Inheritance: Objects can inherit and override learning strategies from parent classes, enabling sophisticated policy hierarchies
  2. Dynamic Coalition Formation: Objects form temporary alliances for collective learning and problem solving
  3. Reward Function Evolution: Objects evolve their own intrinsic reward functions through meta-learning processes
  4. Multi-Objective Optimization: Objects balance multiple competing goals through hierarchical objective structures
  5. Distributed Policy Learning: Objects share knowledge and learn collectively across object networks through social learning mechanisms

Framework Architecture

Learning Levels

OORL operates at multiple levels of learning:

  • Individual Learning: Traditional RL with policy and value function updates
  • Social Learning: Learning from peer objects through observation and imitation
  • Collective Learning: Distributed optimization across object coalitions
  • Meta-Learning: Learning to learn - adaptation of learning strategies themselves

Key Components

Performance Characteristics

  • Learning Speed: 2-5x faster convergence through social learning
  • Scalability: Linear scaling with number of objects in coalition
  • Robustness: Graceful degradation with partial coalition failures
  • Adaptation: Dynamic strategy adjustment based on environment changes

Example Usage

# Initialize OORL learning for an object
{:ok, oorl_state} = OORL.initialize_oorl_object("agent_1", %{
  policy_type: :neural,
  social_learning_enabled: true,
  meta_learning_enabled: true
})

# Perform learning step with social context
social_context = %{
  peer_rewards: [{"agent_2", 0.8}, {"agent_3", 0.6}],
  interaction_dyads: ["dyad_1", "dyad_2"]
}

{:ok, results} = OORL.learning_step(
  "agent_1", current_state, action, reward, next_state, social_context
)

# Form learning coalition
{:ok, coalition} = OORL.CollectiveLearning.form_learning_coalition(
  ["agent_1", "agent_2", "agent_3"],
  %{task_type: :coordination, difficulty: :high}
)

Summary

Types

Learning experience containing state transition and social context.

Exploration strategy configuration

Unique goal identifier

Goal specification with success criteria

Hierarchical goal structure with priorities

Social learning graph representing object relationships

Learning strategy configuration

Meta-learning state for strategy adaptation

Neural network architecture specification

Complete OORL state for an object with all learning capabilities.

Performance metric for meta-learning

Policy specification defining the learning agent's decision-making strategy.

Reward component for multi-objective optimization

Reward function specification with components

Social learning context containing peer information and interaction history.

Trigger condition for strategy adaptation

Value function specification and parameters

Functions

Initializes an OORL object with learning capabilities.

Performs a single learning step for an OORL object.

Types

action_observation()

@type action_observation() :: %{
  object_id: object_id(),
  action: any(),
  outcome: any(),
  timestamp: DateTime.t()
}

coalition_id()

@type coalition_id() :: String.t()

dyad_id()

@type dyad_id() :: String.t()

experience()

@type experience() :: %{
  state: any(),
  action: any(),
  reward: float(),
  next_state: any(),
  social_context: social_context(),
  meta_features: %{
    state_complexity: float(),
    action_confidence: float(),
    reward_surprise: float(),
    learning_opportunity: float()
  },
  timestamp: DateTime.t(),
  interaction_dyad: dyad_id() | nil,
  learning_signal: float()
}

Learning experience containing state transition and social context.

Fields

  • state - Environment state before action
  • action - Action taken by the object
  • reward - Numerical reward received
  • next_state - Environment state after action
  • social_context - Social learning context at time of experience
  • meta_features - Meta-learning features (complexity, novelty, etc.)
  • timestamp - When the experience occurred
  • interaction_dyad - Dyad involved in the experience (if any)
  • learning_signal - Strength of learning signal for this experience

Learning Integration

Experiences are used for:

  • Policy gradient updates
  • Value function learning
  • Social learning integration
  • Meta-learning strategy adaptation

exploration_spec()

@type exploration_spec() :: %{
  type: :epsilon_greedy | :ucb | :thompson_sampling | :curiosity_driven,
  parameters: map(),
  adaptation_enabled: boolean(),
  social_influence: float()
}

Exploration strategy configuration

goal_id()

@type goal_id() :: String.t()

Unique goal identifier

goal_spec()

@type goal_spec() :: %{
  id: goal_id(),
  description: String.t(),
  success_threshold: float(),
  priority: float(),
  time_horizon: pos_integer()
}

Goal specification with success criteria

goal_tree()

@type goal_tree() :: %{
  primary_goals: [goal_spec()],
  sub_goals: %{required(goal_id()) => [goal_spec()]},
  goal_weights: %{required(goal_id()) => float()},
  goal_dependencies: %{required(goal_id()) => [goal_id()]}
}

Hierarchical goal structure with priorities

graph()

@type graph() :: %{
  nodes: [object_id()],
  edges: [{object_id(), object_id(), float()}],
  centrality_scores: %{required(object_id()) => float()},
  clustering_coefficient: float()
}

Social learning graph representing object relationships

learning_strategy()

@type learning_strategy() :: %{
  algorithm: :q_learning | :policy_gradient | :actor_critic,
  hyperparameters: map(),
  social_weight: float(),
  exploration_strategy: exploration_spec()
}

Learning strategy configuration

message()

@type message() :: %{
  sender: object_id(),
  content: any(),
  recipients: [object_id()],
  role: :prompt | :response,
  timestamp: DateTime.t(),
  dyad_id: dyad_id() | nil
}

meta_state()

@type meta_state() :: %{
  learning_history: [performance_metric()],
  adaptation_triggers: [trigger_condition()],
  strategy_variants: [learning_strategy()],
  performance_baseline: float()
}

Meta-learning state for strategy adaptation

network_spec()

@type network_spec() :: %{
  layers: [pos_integer()],
  activation: :relu | :tanh | :sigmoid,
  dropout_rate: float(),
  batch_normalization: boolean()
}

Neural network architecture specification

object_id()

@type object_id() :: String.t()

oorl_state()

@type oorl_state() :: %{
  policy_network: policy_spec(),
  value_function: value_spec(),
  experience_buffer: [experience()],
  social_learning_graph: graph(),
  meta_learning_state: meta_state(),
  goal_hierarchy: goal_tree(),
  reward_function: reward_spec(),
  exploration_strategy: exploration_spec()
}

Complete OORL state for an object with all learning capabilities.

Fields

  • policy_network - Decision-making policy (neural, tabular, or hybrid)
  • value_function - State value estimation function
  • experience_buffer - Replay buffer for learning experiences
  • social_learning_graph - Network of social connections and trust
  • meta_learning_state - Strategy adaptation and meta-learning
  • goal_hierarchy - Multi-objective goal structure with priorities
  • reward_function - Multi-component reward specification
  • exploration_strategy - Exploration/exploitation strategy

Integration

All components work together to provide:

  • Individual reinforcement learning
  • Social learning from peers
  • Collective learning in coalitions
  • Meta-learning for strategy adaptation

performance_metric()

@type performance_metric() :: %{
  timestamp: DateTime.t(),
  reward: float(),
  learning_rate: float(),
  convergence_speed: float(),
  social_benefit: float()
}

Performance metric for meta-learning

policy_spec()

@type policy_spec() :: %{
  type: :neural | :tabular | :hybrid | :evolved,
  parameters: %{required(atom()) => any()},
  architecture: network_spec(),
  update_rule: :gradient_ascent | :natural_gradient | :proximal_policy,
  social_influence_weight: float()
}

Policy specification defining the learning agent's decision-making strategy.

Fields

  • type - Policy representation type
  • parameters - Policy-specific parameters
  • architecture - Network structure for neural policies
  • update_rule - Algorithm for policy updates
  • social_influence_weight - Weighting for social learning integration

reward_component()

@type reward_component() ::
  :task_reward | :social_reward | :curiosity_reward | :intrinsic_reward

Reward component for multi-objective optimization

reward_spec()

@type reward_spec() :: %{
  components: [reward_component()],
  weights: %{required(atom()) => float()},
  adaptation_rate: float(),
  intrinsic_motivation: float()
}

Reward function specification with components

social_context()

@type social_context() :: %{
  observed_actions: [action_observation()],
  peer_rewards: [{object_id(), float()}],
  coalition_membership: [coalition_id()],
  reputation_scores: %{required(object_id()) => float()},
  interaction_dyads: [dyad_id()],
  message_history: [message()]
}

Social learning context containing peer information and interaction history.

Fields

  • observed_actions - Actions observed from peer objects with outcomes
  • peer_rewards - Recent reward signals from peer objects
  • coalition_membership - List of coalitions this object belongs to
  • reputation_scores - Trust and reliability scores for peer objects
  • interaction_dyads - Active interaction dyads with other objects
  • message_history - Recent communication history for context

Usage in Learning

Social context enables:

  • Imitation learning from successful peers
  • Coordination with coalition members
  • Trust-based learning partner selection
  • Communication-informed decision making

trigger_condition()

@type trigger_condition() :: %{
  metric: atom(),
  threshold: float(),
  comparison: :greater_than | :less_than | :equal_to,
  window_size: pos_integer()
}

Trigger condition for strategy adaptation

value_spec()

@type value_spec() :: %{
  type: :neural | :tabular | :linear,
  architecture: network_spec(),
  learning_rate: float(),
  discount_factor: float()
}

Value function specification and parameters

Functions

initialize_oorl_object(object_id, learning_config \\ %{})

@spec initialize_oorl_object(Object.object_id(), map()) :: {:ok, oorl_state()}

Initializes an OORL object with learning capabilities.

Sets up a complete OORL learning system for an object including policy networks, value functions, social learning capabilities, and meta-learning features. This is the entry point for enabling advanced learning capabilities on any AAOS object.

Parameters

  • object_id - Unique identifier for the learning object
  • learning_config - Configuration options map with the following keys:
    • :policy_type - Policy representation (:neural, :tabular, default: :neural)
    • :social_learning_enabled - Enable social learning (default: true)
    • :meta_learning_enabled - Enable meta-learning (default: true)
    • :curiosity_driven - Enable curiosity-driven exploration (default: true)
    • :coalition_participation - Allow coalition membership (default: true)
    • :learning_rate - Base learning rate (default: 0.01)
    • :exploration_rate - Initial exploration rate (default: 0.1)
    • :discount_factor - Future reward discount (default: 0.95)

Returns

  • {:ok, oorl_state} - Successfully initialized OORL state structure

OORL State Structure

The returned state includes:

  • Policy Network: Decision-making policy (neural or tabular)
  • Value Function: State value estimation function
  • Experience Buffer: Replay buffer for learning
  • Social Learning Graph: Network of social connections
  • Meta-Learning State: Strategy adaptation mechanisms
  • Goal Hierarchy: Multi-objective goal structure
  • Reward Function: Multi-component reward specification
  • Exploration Strategy: Exploration/exploitation balance

Examples

# Initialize with neural policy
iex> {:ok, state} = OORL.initialize_oorl_object("agent_1", %{
...>   policy_type: :neural,
...>   learning_rate: 0.001,
...>   social_learning_enabled: true
...> })
iex> state.policy_network.type
:neural

# Initialize tabular policy for discrete environments
iex> {:ok, state} = OORL.initialize_oorl_object("discrete_agent", %{
...>   policy_type: :tabular,
...>   exploration_rate: 0.2
...> })
iex> state.policy_network.type
:tabular

# Initialize with meta-learning disabled
iex> {:ok, state} = OORL.initialize_oorl_object("simple_agent", %{
...>   meta_learning_enabled: false,
...>   curiosity_driven: false
...> })
iex> state.exploration_strategy.type
:epsilon_greedy

Configuration Guidelines

Policy Type Selection

  • Neural: Continuous state/action spaces, complex patterns
  • Tabular: Discrete spaces, interpretable policies
  • Hybrid: Mixed discrete/continuous environments

Learning Rates

  • High (0.1-0.5): Fast changing environments
  • Medium (0.01-0.1): Typical applications
  • Low (0.001-0.01): Stable environments, fine-tuning

Social Learning

  • Enable for multi-agent environments
  • Disable for single-agent optimization
  • Consider computational overhead

Performance Impact

  • Initialization time: ~5-10ms
  • Memory usage: ~5-50KB depending on configuration
  • Neural networks: Higher memory, better generalization
  • Tabular policies: Lower memory, exact solutions

Error Conditions

Initialization may fail due to:

  • Invalid configuration parameters
  • Insufficient system resources
  • Conflicting option combinations

learning_step(object_id, state, action, reward, next_state, social_context)

@spec learning_step(
  Object.object_id(),
  any(),
  any(),
  float(),
  any(),
  social_context()
) ::
  {:ok,
   %{
     policy_update: map(),
     social_updates: map(),
     meta_updates: map(),
     total_learning_signal: float()
   }}
  | {:error, atom()}

Performs a single learning step for an OORL object.

Processes a complete learning experience including individual policy updates, social learning integration, and meta-learning adaptation. This is the core learning function that integrates multiple levels of learning in a single operation.

Parameters

  • object_id - ID of the learning object (must be OORL-enabled)
  • state - Current environment state (any serializable term)
  • action - Action taken by the object
  • reward - Numerical reward signal received
  • next_state - Resulting environment state after action
  • social_context - Social learning context containing:
    • :observed_actions - Actions observed from peer objects
    • :peer_rewards - Reward signals from peer objects
    • :coalition_membership - Active coalition memberships
    • :interaction_dyads - Active interaction dyads
    • :message_history - Recent communication history

Returns

  • {:ok, learning_results} - Successful learning with detailed results:
    • :policy_update - Individual policy learning results
    • :social_updates - Social learning integration results
    • :meta_updates - Meta-learning adaptation results
    • :total_learning_signal - Aggregate learning signal strength
  • {:error, reason} - Learning step failed due to:
    • :object_not_found - Object not registered
    • :invalid_state - State format invalid
    • :learning_disabled - OORL not enabled for object
    • :resource_exhausted - Insufficient computational resources

Learning Process

Each learning step involves:

  1. Experience Creation: Package (state, action, reward, next_state)
  2. Individual Learning: Update policy using RL algorithm
  3. Social Learning: Integrate peer observations and rewards
  4. Meta-Learning: Adapt learning strategy based on performance
  5. Result Aggregation: Combine learning signals from all levels

Examples

# Basic learning step
iex> social_context = %{
...>   peer_rewards: [{"agent_2", 0.8}],
...>   interaction_dyads: ["dyad_1"]
...> }
iex> {:ok, results} = OORL.learning_step(
...>   "agent_1", 
...>   %{position: {0, 0}}, 
...>   :move_right, 
...>   1.0, 
...>   %{position: {1, 0}},
...>   social_context
...> )
iex> results.total_learning_signal
0.35

# Learning with rich social context
iex> rich_context = %{
...>   observed_actions: [
...>     %{object_id: "agent_2", action: :explore, outcome: :success},
...>     %{object_id: "agent_3", action: :exploit, outcome: :failure}
...>   ],
...>   peer_rewards: [{"agent_2", 1.2}, {"agent_3", -0.5}],
...>   coalition_membership: ["coalition_alpha"],
...>   interaction_dyads: ["dyad_2", "dyad_3"]
...> }
iex> {:ok, results} = OORL.learning_step(
...>   "social_agent", current_state, action, reward, next_state, rich_context
...> )
iex> results.social_updates.peer_influence
0.25

Learning Algorithms

The learning step uses different algorithms based on policy type:

Neural Policies

  • Policy gradient with social regularization
  • Experience replay with peer experiences
  • Neural network parameter updates

Tabular Policies

  • Q-learning with social Q-value sharing
  • Direct state-action value updates
  • Exploration bonus from peer actions

Social Learning Integration

Social learning enhances individual learning through:

  • Imitation: Copy successful actions from high-performing peers
  • Advice Taking: Weight peer rewards in policy updates
  • Coordination: Align actions with coalition objectives
  • Knowledge Transfer: Share learned policies across similar states

Performance Characteristics

  • Learning step time: 1-10ms depending on complexity
  • Memory usage: Temporary allocations for experience processing
  • Convergence: 2-5x faster with effective social learning
  • Scalability: Linear with number of peer objects in context

Meta-Learning Adaptation

Meta-learning continuously adapts:

  • Learning rates based on convergence speed
  • Exploration strategies based on environment dynamics
  • Social weights based on peer performance
  • Reward function components based on goal achievement