OORL (object v0.1.2)
Object-Oriented Reinforcement Learning Framework
OORL extends traditional reinforcement learning by treating each learning agent as a full autonomous object with encapsulated state, behavior polymorphism, and sophisticated social learning capabilities. This framework enables complex multi-agent learning scenarios that go far beyond traditional flat RL approaches.
Core Principles
OORL objects exhibit several advanced capabilities:
- Behavioral Inheritance: Objects can inherit and override learning strategies from parent classes, enabling sophisticated policy hierarchies
- Dynamic Coalition Formation: Objects form temporary alliances for collective learning and problem solving
- Reward Function Evolution: Objects evolve their own intrinsic reward functions through meta-learning processes
- Multi-Objective Optimization: Objects balance multiple competing goals through hierarchical objective structures
- Distributed Policy Learning: Objects share knowledge and learn collectively across object networks through social learning mechanisms
Framework Architecture
Learning Levels
OORL operates at multiple levels of learning:
- Individual Learning: Traditional RL with policy and value function updates
- Social Learning: Learning from peer objects through observation and imitation
- Collective Learning: Distributed optimization across object coalitions
- Meta-Learning: Learning to learn - adaptation of learning strategies themselves
Key Components
OORL.PolicyLearning
: Individual and social policy learning algorithmsOORL.CollectiveLearning
: Coalition formation and distributed optimizationOORL.MetaLearning
: Meta-learning and strategy evolution
Performance Characteristics
- Learning Speed: 2-5x faster convergence through social learning
- Scalability: Linear scaling with number of objects in coalition
- Robustness: Graceful degradation with partial coalition failures
- Adaptation: Dynamic strategy adjustment based on environment changes
Example Usage
# Initialize OORL learning for an object
{:ok, oorl_state} = OORL.initialize_oorl_object("agent_1", %{
policy_type: :neural,
social_learning_enabled: true,
meta_learning_enabled: true
})
# Perform learning step with social context
social_context = %{
peer_rewards: [{"agent_2", 0.8}, {"agent_3", 0.6}],
interaction_dyads: ["dyad_1", "dyad_2"]
}
{:ok, results} = OORL.learning_step(
"agent_1", current_state, action, reward, next_state, social_context
)
# Form learning coalition
{:ok, coalition} = OORL.CollectiveLearning.form_learning_coalition(
["agent_1", "agent_2", "agent_3"],
%{task_type: :coordination, difficulty: :high}
)
Summary
Types
Learning experience containing state transition and social context.
Exploration strategy configuration
Unique goal identifier
Goal specification with success criteria
Hierarchical goal structure with priorities
Social learning graph representing object relationships
Learning strategy configuration
Meta-learning state for strategy adaptation
Neural network architecture specification
Complete OORL state for an object with all learning capabilities.
Performance metric for meta-learning
Policy specification defining the learning agent's decision-making strategy.
Reward component for multi-objective optimization
Reward function specification with components
Social learning context containing peer information and interaction history.
Trigger condition for strategy adaptation
Value function specification and parameters
Functions
Initializes an OORL object with learning capabilities.
Performs a single learning step for an OORL object.
Types
@type action_observation() :: %{ object_id: object_id(), action: any(), outcome: any(), timestamp: DateTime.t() }
@type coalition_id() :: String.t()
@type dyad_id() :: String.t()
@type experience() :: %{ state: any(), action: any(), reward: float(), next_state: any(), social_context: social_context(), meta_features: %{ state_complexity: float(), action_confidence: float(), reward_surprise: float(), learning_opportunity: float() }, timestamp: DateTime.t(), interaction_dyad: dyad_id() | nil, learning_signal: float() }
Learning experience containing state transition and social context.
Fields
state
- Environment state before actionaction
- Action taken by the objectreward
- Numerical reward receivednext_state
- Environment state after actionsocial_context
- Social learning context at time of experiencemeta_features
- Meta-learning features (complexity, novelty, etc.)timestamp
- When the experience occurredinteraction_dyad
- Dyad involved in the experience (if any)learning_signal
- Strength of learning signal for this experience
Learning Integration
Experiences are used for:
- Policy gradient updates
- Value function learning
- Social learning integration
- Meta-learning strategy adaptation
@type exploration_spec() :: %{ type: :epsilon_greedy | :ucb | :thompson_sampling | :curiosity_driven, parameters: map(), adaptation_enabled: boolean(), social_influence: float() }
Exploration strategy configuration
@type goal_id() :: String.t()
Unique goal identifier
@type goal_spec() :: %{ id: goal_id(), description: String.t(), success_threshold: float(), priority: float(), time_horizon: pos_integer() }
Goal specification with success criteria
@type goal_tree() :: %{ primary_goals: [goal_spec()], sub_goals: %{required(goal_id()) => [goal_spec()]}, goal_weights: %{required(goal_id()) => float()}, goal_dependencies: %{required(goal_id()) => [goal_id()]} }
Hierarchical goal structure with priorities
@type graph() :: %{ nodes: [object_id()], edges: [{object_id(), object_id(), float()}], centrality_scores: %{required(object_id()) => float()}, clustering_coefficient: float() }
Social learning graph representing object relationships
@type learning_strategy() :: %{ algorithm: :q_learning | :policy_gradient | :actor_critic, hyperparameters: map(), social_weight: float(), exploration_strategy: exploration_spec() }
Learning strategy configuration
@type message() :: %{ sender: object_id(), content: any(), recipients: [object_id()], role: :prompt | :response, timestamp: DateTime.t(), dyad_id: dyad_id() | nil }
@type meta_state() :: %{ learning_history: [performance_metric()], adaptation_triggers: [trigger_condition()], strategy_variants: [learning_strategy()], performance_baseline: float() }
Meta-learning state for strategy adaptation
@type network_spec() :: %{ layers: [pos_integer()], activation: :relu | :tanh | :sigmoid, dropout_rate: float(), batch_normalization: boolean() }
Neural network architecture specification
@type object_id() :: String.t()
@type oorl_state() :: %{ policy_network: policy_spec(), value_function: value_spec(), experience_buffer: [experience()], social_learning_graph: graph(), meta_learning_state: meta_state(), goal_hierarchy: goal_tree(), reward_function: reward_spec(), exploration_strategy: exploration_spec() }
Complete OORL state for an object with all learning capabilities.
Fields
policy_network
- Decision-making policy (neural, tabular, or hybrid)value_function
- State value estimation functionexperience_buffer
- Replay buffer for learning experiencessocial_learning_graph
- Network of social connections and trustmeta_learning_state
- Strategy adaptation and meta-learninggoal_hierarchy
- Multi-objective goal structure with prioritiesreward_function
- Multi-component reward specificationexploration_strategy
- Exploration/exploitation strategy
Integration
All components work together to provide:
- Individual reinforcement learning
- Social learning from peers
- Collective learning in coalitions
- Meta-learning for strategy adaptation
@type performance_metric() :: %{ timestamp: DateTime.t(), reward: float(), learning_rate: float(), convergence_speed: float(), social_benefit: float() }
Performance metric for meta-learning
@type policy_spec() :: %{ type: :neural | :tabular | :hybrid | :evolved, parameters: %{required(atom()) => any()}, architecture: network_spec(), update_rule: :gradient_ascent | :natural_gradient | :proximal_policy, social_influence_weight: float() }
Policy specification defining the learning agent's decision-making strategy.
Fields
type
- Policy representation typeparameters
- Policy-specific parametersarchitecture
- Network structure for neural policiesupdate_rule
- Algorithm for policy updatessocial_influence_weight
- Weighting for social learning integration
@type reward_component() ::
:task_reward | :social_reward | :curiosity_reward | :intrinsic_reward
Reward component for multi-objective optimization
@type reward_spec() :: %{ components: [reward_component()], weights: %{required(atom()) => float()}, adaptation_rate: float(), intrinsic_motivation: float() }
Reward function specification with components
@type social_context() :: %{ observed_actions: [action_observation()], peer_rewards: [{object_id(), float()}], coalition_membership: [coalition_id()], reputation_scores: %{required(object_id()) => float()}, interaction_dyads: [dyad_id()], message_history: [message()] }
Social learning context containing peer information and interaction history.
Fields
observed_actions
- Actions observed from peer objects with outcomespeer_rewards
- Recent reward signals from peer objectscoalition_membership
- List of coalitions this object belongs toreputation_scores
- Trust and reliability scores for peer objectsinteraction_dyads
- Active interaction dyads with other objectsmessage_history
- Recent communication history for context
Usage in Learning
Social context enables:
- Imitation learning from successful peers
- Coordination with coalition members
- Trust-based learning partner selection
- Communication-informed decision making
@type trigger_condition() :: %{ metric: atom(), threshold: float(), comparison: :greater_than | :less_than | :equal_to, window_size: pos_integer() }
Trigger condition for strategy adaptation
@type value_spec() :: %{ type: :neural | :tabular | :linear, architecture: network_spec(), learning_rate: float(), discount_factor: float() }
Value function specification and parameters
Functions
@spec initialize_oorl_object(Object.object_id(), map()) :: {:ok, oorl_state()}
Initializes an OORL object with learning capabilities.
Sets up a complete OORL learning system for an object including policy networks, value functions, social learning capabilities, and meta-learning features. This is the entry point for enabling advanced learning capabilities on any AAOS object.
Parameters
object_id
- Unique identifier for the learning objectlearning_config
- Configuration options map with the following keys::policy_type
- Policy representation (:neural, :tabular, default: :neural):social_learning_enabled
- Enable social learning (default: true):meta_learning_enabled
- Enable meta-learning (default: true):curiosity_driven
- Enable curiosity-driven exploration (default: true):coalition_participation
- Allow coalition membership (default: true):learning_rate
- Base learning rate (default: 0.01):exploration_rate
- Initial exploration rate (default: 0.1):discount_factor
- Future reward discount (default: 0.95)
Returns
{:ok, oorl_state}
- Successfully initialized OORL state structure
OORL State Structure
The returned state includes:
- Policy Network: Decision-making policy (neural or tabular)
- Value Function: State value estimation function
- Experience Buffer: Replay buffer for learning
- Social Learning Graph: Network of social connections
- Meta-Learning State: Strategy adaptation mechanisms
- Goal Hierarchy: Multi-objective goal structure
- Reward Function: Multi-component reward specification
- Exploration Strategy: Exploration/exploitation balance
Examples
# Initialize with neural policy
iex> {:ok, state} = OORL.initialize_oorl_object("agent_1", %{
...> policy_type: :neural,
...> learning_rate: 0.001,
...> social_learning_enabled: true
...> })
iex> state.policy_network.type
:neural
# Initialize tabular policy for discrete environments
iex> {:ok, state} = OORL.initialize_oorl_object("discrete_agent", %{
...> policy_type: :tabular,
...> exploration_rate: 0.2
...> })
iex> state.policy_network.type
:tabular
# Initialize with meta-learning disabled
iex> {:ok, state} = OORL.initialize_oorl_object("simple_agent", %{
...> meta_learning_enabled: false,
...> curiosity_driven: false
...> })
iex> state.exploration_strategy.type
:epsilon_greedy
Configuration Guidelines
Policy Type Selection
- Neural: Continuous state/action spaces, complex patterns
- Tabular: Discrete spaces, interpretable policies
- Hybrid: Mixed discrete/continuous environments
Learning Rates
- High (0.1-0.5): Fast changing environments
- Medium (0.01-0.1): Typical applications
- Low (0.001-0.01): Stable environments, fine-tuning
Social Learning
- Enable for multi-agent environments
- Disable for single-agent optimization
- Consider computational overhead
Performance Impact
- Initialization time: ~5-10ms
- Memory usage: ~5-50KB depending on configuration
- Neural networks: Higher memory, better generalization
- Tabular policies: Lower memory, exact solutions
Error Conditions
Initialization may fail due to:
- Invalid configuration parameters
- Insufficient system resources
- Conflicting option combinations
@spec learning_step( Object.object_id(), any(), any(), float(), any(), social_context() ) :: {:ok, %{ policy_update: map(), social_updates: map(), meta_updates: map(), total_learning_signal: float() }} | {:error, atom()}
Performs a single learning step for an OORL object.
Processes a complete learning experience including individual policy updates, social learning integration, and meta-learning adaptation. This is the core learning function that integrates multiple levels of learning in a single operation.
Parameters
object_id
- ID of the learning object (must be OORL-enabled)state
- Current environment state (any serializable term)action
- Action taken by the objectreward
- Numerical reward signal receivednext_state
- Resulting environment state after actionsocial_context
- Social learning context containing::observed_actions
- Actions observed from peer objects:peer_rewards
- Reward signals from peer objects:coalition_membership
- Active coalition memberships:interaction_dyads
- Active interaction dyads:message_history
- Recent communication history
Returns
{:ok, learning_results}
- Successful learning with detailed results::policy_update
- Individual policy learning results:social_updates
- Social learning integration results:meta_updates
- Meta-learning adaptation results:total_learning_signal
- Aggregate learning signal strength
{:error, reason}
- Learning step failed due to::object_not_found
- Object not registered:invalid_state
- State format invalid:learning_disabled
- OORL not enabled for object:resource_exhausted
- Insufficient computational resources
Learning Process
Each learning step involves:
- Experience Creation: Package (state, action, reward, next_state)
- Individual Learning: Update policy using RL algorithm
- Social Learning: Integrate peer observations and rewards
- Meta-Learning: Adapt learning strategy based on performance
- Result Aggregation: Combine learning signals from all levels
Examples
# Basic learning step
iex> social_context = %{
...> peer_rewards: [{"agent_2", 0.8}],
...> interaction_dyads: ["dyad_1"]
...> }
iex> {:ok, results} = OORL.learning_step(
...> "agent_1",
...> %{position: {0, 0}},
...> :move_right,
...> 1.0,
...> %{position: {1, 0}},
...> social_context
...> )
iex> results.total_learning_signal
0.35
# Learning with rich social context
iex> rich_context = %{
...> observed_actions: [
...> %{object_id: "agent_2", action: :explore, outcome: :success},
...> %{object_id: "agent_3", action: :exploit, outcome: :failure}
...> ],
...> peer_rewards: [{"agent_2", 1.2}, {"agent_3", -0.5}],
...> coalition_membership: ["coalition_alpha"],
...> interaction_dyads: ["dyad_2", "dyad_3"]
...> }
iex> {:ok, results} = OORL.learning_step(
...> "social_agent", current_state, action, reward, next_state, rich_context
...> )
iex> results.social_updates.peer_influence
0.25
Learning Algorithms
The learning step uses different algorithms based on policy type:
Neural Policies
- Policy gradient with social regularization
- Experience replay with peer experiences
- Neural network parameter updates
Tabular Policies
- Q-learning with social Q-value sharing
- Direct state-action value updates
- Exploration bonus from peer actions
Social Learning Integration
Social learning enhances individual learning through:
- Imitation: Copy successful actions from high-performing peers
- Advice Taking: Weight peer rewards in policy updates
- Coordination: Align actions with coalition objectives
- Knowledge Transfer: Share learned policies across similar states
Performance Characteristics
- Learning step time: 1-10ms depending on complexity
- Memory usage: Temporary allocations for experience processing
- Convergence: 2-5x faster with effective social learning
- Scalability: Linear with number of peer objects in context
Meta-Learning Adaptation
Meta-learning continuously adapts:
- Learning rates based on convergence speed
- Exploration strategies based on environment dynamics
- Social weights based on peer performance
- Reward function components based on goal achievement