OORL.RewardLearning (object v0.1.2)

OORL Reward Learning module implementing mathematical reward combination algorithms as specified in AAOS Section 6.

Provides multiple reward combination strategies:

  • Linear combination
  • Weighted combination
  • Adaptive combination
  • Hierarchical combination

Maintains mathematical properties including Lipschitz continuity and bounded learning.

Summary

Functions

Adapts reward weights based on performance feedback.

Combines multiple reward components using the specified strategy.

Creates a new reward learning configuration.

Validates that reward function maintains mathematical properties.

Types

reward_combination_strategy()

@type reward_combination_strategy() :: :linear | :weighted | :adaptive | :hierarchical

reward_component()

@type reward_component() :: %{
  type: :task_reward | :social_reward | :curiosity_reward | :intrinsic_reward,
  value: float(),
  confidence: float(),
  source: String.t()
}

Functions

adapt_weights(reward_learner, performance_metrics)

Adapts reward weights based on performance feedback.

Parameters

  • reward_learner: Current reward learning configuration
  • performance_metrics: Performance feedback data

Returns

Updated %OORL.RewardLearning{} struct

combine_rewards(extrinsic_rewards, intrinsic_rewards, strategy \\ :linear)

Combines multiple reward components using the specified strategy.

Parameters

  • extrinsic_rewards: List of external reward components
  • intrinsic_rewards: List of internal reward components
  • strategy: Combination strategy to use

Returns

{:ok, combined_reward} or {:error, reason}

Examples

iex> OORL.RewardLearning.combine_rewards([%{type: :task_reward, value: 0.8}], 
...>   [%{type: :curiosity_reward, value: 0.3}], :linear)
{:ok, 1.1}

new(opts \\ [])

Creates a new reward learning configuration.

Parameters

  • opts: Configuration options including strategy, weights, adaptation_rate

Returns

%OORL.RewardLearning{} struct

validate_mathematical_properties(reward_function, test_points)

Validates that reward function maintains mathematical properties.

Parameters

  • reward_function: Function to validate
  • test_points: Sample points for validation

Returns

{:ok, validation_results} with properties like Lipschitz continuity