OORL.RewardLearning (object v0.1.2)
OORL Reward Learning module implementing mathematical reward combination algorithms as specified in AAOS Section 6.
Provides multiple reward combination strategies:
- Linear combination
- Weighted combination
- Adaptive combination
- Hierarchical combination
Maintains mathematical properties including Lipschitz continuity and bounded learning.
Summary
Functions
Adapts reward weights based on performance feedback.
Combines multiple reward components using the specified strategy.
Creates a new reward learning configuration.
Validates that reward function maintains mathematical properties.
Types
@type reward_combination_strategy() :: :linear | :weighted | :adaptive | :hierarchical
Functions
Adapts reward weights based on performance feedback.
Parameters
reward_learner
: Current reward learning configurationperformance_metrics
: Performance feedback data
Returns
Updated %OORL.RewardLearning{}
struct
Combines multiple reward components using the specified strategy.
Parameters
extrinsic_rewards
: List of external reward componentsintrinsic_rewards
: List of internal reward componentsstrategy
: Combination strategy to use
Returns
{:ok, combined_reward}
or {:error, reason}
Examples
iex> OORL.RewardLearning.combine_rewards([%{type: :task_reward, value: 0.8}],
...> [%{type: :curiosity_reward, value: 0.3}], :linear)
{:ok, 1.1}
Creates a new reward learning configuration.
Parameters
opts
: Configuration options including strategy, weights, adaptation_rate
Returns
%OORL.RewardLearning{}
struct
Validates that reward function maintains mathematical properties.
Parameters
reward_function
: Function to validatetest_points
: Sample points for validation
Returns
{:ok, validation_results}
with properties like Lipschitz continuity