v0.7.0 - Hardened Core

View Source

Overview

This release begins the Robustness Phase by fixing all identified bugs, updating deprecated APIs, and improving error handling. The focus is on making the core modules production-ready.

Phase: Robustness Duration: 2 weeks Prerequisites: v0.6.0 (population_monitor refactored, Structural Phase complete)

Objectives

  1. Fix all identified bugs with regression tests
  2. Update deprecated APIs (now(), random module)
  3. Add structured error handling
  4. Improve logging and debugging support
  5. Document error cases and recovery

Bugs to Fix

1. cortex.erl - Typo Bug (line 65)

Analysis Reference: Section 8.1 of DXNN2_CODEBASE_ANALYSIS.md (line 65)

Issue: Message atom misspelled "termiante"

Fix:

%% Before
{self(), termiante}

%% After
{self(), terminate}

Regression Test:

terminate_message_format_test() ->
    %% Verify terminate message uses correct atom
    State = test_helpers:create_cortex_state(),
    Pid = cortex:gen(self(), State),

    %% Send terminate and verify clean shutdown
    Pid ! terminate,
    timer:sleep(50),
    ?assertNot(is_process_alive(Pid)).

2. neuron.erl - Return Value Bug (line 299)

Analysis Reference: Section 8.1 of DXNN2_CODEBASE_ANALYSIS.md (line 299)

Issue: Returns original parameters instead of perturbed

Fix:

%% Before
perturb_PF(Spread, {PFName, PFParameters}) ->
    U_PFParameters = [perturb_param(P, Spread) || P <- PFParameters],
    {PFName, PFParameters}.  % BUG: returns original

%% After
perturb_plasticity_function(Spread, {PFName, PFParameters}) ->
    UpdatedParameters = [perturb_param(P, Spread) || P <- PFParameters],
    {PFName, UpdatedParameters}.  % Returns perturbed

Regression Test:

perturb_plasticity_function_test() ->
    Original = {hebbian, [0.1, 0.2, 0.3]},
    Spread = 1.0,

    {PFName, Perturbed} = neuron:perturb_plasticity_function(Spread, Original),

    ?assertEqual(hebbian, PFName),
    ?assertEqual(3, length(Perturbed)),
    %% At least one parameter should change
    ?assertNotEqual([0.1, 0.2, 0.3], Perturbed).

3. genome_mutator.erl - Typo in Constant (line 26)

Analysis Reference: Section 8.1 of DXNN2_CODEBASE_ANALYSIS.md (line 26)

Issue: Constant name has typo "PARAMTERS"

Fix:

%% Before
-define(SEARCH_PARAMTERS_MUTATION_PROBABILITY, 0).

%% After
-define(SEARCH_PARAMETERS_MUTATION_PROBABILITY, 0).

Verification:

%% Ensure code compiles and constant is accessible
constant_name_test() ->
    %% This will fail to compile if constant misspelled
    Value = ?SEARCH_PARAMETERS_MUTATION_PROBABILITY,
    ?assertEqual(0, Value).

API Updates

1. Replace now() with Modern Time API

Analysis Reference: Section 8.2 of DXNN2_CODEBASE_ANALYSIS.md

Affected Modules:

  • cortex.erl
  • neuron.erl
  • exoself.erl
  • genome_mutator.erl

Changes:

%% Before: deprecated now()
random:seed(now())
StartTime = now()
timer:now_diff(now(), StartTime)

%% After: modern API
rand:seed(exsss, erlang:timestamp())
StartTime = erlang:monotonic_time()
erlang:convert_time_unit(
    erlang:monotonic_time() - StartTime,
    native,
    millisecond
)

Helper Module:

-module(time_utils).
-export([timestamp/0, elapsed_ms/1, seed_random/0]).

%% @doc Get current timestamp for timing
-spec timestamp() -> integer().
timestamp() ->
    erlang:monotonic_time().

%% @doc Calculate elapsed time in milliseconds
-spec elapsed_ms(integer()) -> integer().
elapsed_ms(StartTime) ->
    erlang:convert_time_unit(
        erlang:monotonic_time() - StartTime,
        native,
        millisecond
    ).

%% @doc Seed random number generator
-spec seed_random() -> ok.
seed_random() ->
    rand:seed(exsss),
    ok.

2. Replace random Module with rand

Analysis Reference: Section 8.2 of DXNN2_CODEBASE_ANALYSIS.md

Affected Modules: All modules using random

Changes:

%% Before: deprecated random module
random:uniform()
random:uniform(N)
random:seed(A, B, C)

%% After: modern rand module
rand:uniform()
rand:uniform(N)
rand:seed(exsss, {A, B, C})

Error Handling Improvements

1. Replace exit() with Structured Errors

Analysis Reference: Section 5.4 of DXNN2_CODEBASE_ANALYSIS.md (line 153)

Before:

exit("********ERROR:select_random_MO:: reached []...")

After:

-type mutation_error() :: {error, {mutation_type(), reason(), context()}}.
-type mutation_type() :: select_operator | add_neuron | add_link | ...
-type reason() :: atom().
-type context() :: map().

%% Structured error
error({selection_failed, no_operators, #{
    agent_id => AgentId,
    constraint => Constraint
}})

%% Or return error tuple
{error, {selection_failed, no_operators}}

2. Add Error Handling to Mutation Operators

%% Wrap mutations with error handling
apply_mutation(AgentId, Operator) ->
    try
        genome_mutator:Operator(AgentId)
    catch
        error:{mutation_failed, Reason} ->
            log_mutation_failure(AgentId, Operator, Reason),
            {error, Reason};
        error:Reason:Stacktrace ->
            log_unexpected_error(AgentId, Operator, Reason, Stacktrace),
            {error, {unexpected, Reason}}
    end.

3. Add Error Return Types to Specs

-spec add_neuron(agent_id()) -> ok | {error, add_neuron_error()}.
-type add_neuron_error() :: no_splittable_link | database_error | invalid_agent.

-spec mutate_weights(agent_id()) -> ok | {error, mutate_error()}.
-type mutate_error() :: no_neurons | invalid_weight_spec.

Logging and Debugging

1. Add Logging Infrastructure

-module(tweann_logger).
-export([debug/2, info/2, warning/2, error/2]).

-spec debug(Format, Args) -> ok when
    Format :: string(),
    Args :: [term()].
debug(Format, Args) ->
    log(debug, Format, Args).

-spec error(Format, Args) -> ok when
    Format :: string(),
    Args :: [term()].
error(Format, Args) ->
    log(error, Format, Args).

log(Level, Format, Args) ->
    %% Use OTP logger
    logger:log(Level, Format, Args).

2. Add Debug Points to Critical Operations

%% In genome_mutator.erl
add_neuron(AgentId) ->
    tweann_logger:debug("add_neuron: starting for agent ~p", [AgentId]),

    case find_splittable_link(AgentId) of
        {error, Reason} = Error ->
            tweann_logger:debug("add_neuron: failed ~p", [Reason]),
            Error;
        {FromId, ToId, Weight} ->
            tweann_logger:debug("add_neuron: splitting ~p -> ~p", [FromId, ToId]),
            %% ... mutation logic
            tweann_logger:debug("add_neuron: success", []),
            ok
    end.

3. Add Tracing Support

-module(tweann_trace).
-export([enable/1, disable/0, trace_module/1]).

%% Enable tracing for debugging
enable(Module) ->
    dbg:tracer(),
    dbg:p(all, c),
    dbg:tpl(Module, [{'_', [], [{return_trace}]}]).

disable() ->
    dbg:stop_clear().

Tests to Write

bug_fixes_test.erl

-module(bug_fixes_test).
-include_lib("eunit/include/eunit.hrl").

%% ============================================================================
%% Regression tests for identified bugs
%% ============================================================================

%% Bug #1: cortex terminate message
cortex_terminate_spelling_test() ->
    %% Message should be 'terminate' not 'termiante'
    {CortexPid, _} = test_helpers:create_test_cortex(),

    %% Send terminate message
    CortexPid ! terminate,
    timer:sleep(50),

    %% Process should have terminated cleanly
    ?assertNot(is_process_alive(CortexPid)).

%% Bug #2: neuron plasticity perturbation
neuron_perturb_returns_updated_test() ->
    Original = {hebbian, [0.1]},
    Spread = 10.0,  % Large spread to ensure change

    {_, Updated} = neuron:perturb_plasticity_function(Spread, Original),

    %% Must return different values (with very high probability)
    ?assertNotEqual([0.1], Updated).

neuron_perturb_structure_preserved_test() ->
    Original = {ojas, [0.1, 0.2, 0.3]},
    Spread = 1.0,

    {Name, Updated} = neuron:perturb_plasticity_function(Spread, Original),

    ?assertEqual(ojas, Name),
    ?assertEqual(3, length(Updated)).

%% Bug #3: constant spelling
genome_mutator_constant_name_test() ->
    %% If this test runs, the constant is spelled correctly
    %% (would fail to compile otherwise)
    ?assert(true).

deprecated_api_test.erl

-module(deprecated_api_test).
-include_lib("eunit/include/eunit.hrl").

%% ============================================================================
%% Tests for modern API usage
%% ============================================================================

time_utils_timestamp_test() ->
    T1 = time_utils:timestamp(),
    timer:sleep(10),
    T2 = time_utils:timestamp(),
    ?assert(T2 > T1).

time_utils_elapsed_test() ->
    Start = time_utils:timestamp(),
    timer:sleep(100),
    Elapsed = time_utils:elapsed_ms(Start),
    ?assert(Elapsed >= 90),  % Allow some variance
    ?assert(Elapsed < 200).

rand_seed_test() ->
    %% Verify rand is seeded (no crash)
    time_utils:seed_random(),
    Value = rand:uniform(),
    ?assert(Value >= 0.0),
    ?assert(Value < 1.0).

rand_reproducibility_test() ->
    %% Same seed should produce same sequence
    rand:seed(exsss, {1, 2, 3}),
    Seq1 = [rand:uniform() || _ <- lists:seq(1, 5)],

    rand:seed(exsss, {1, 2, 3}),
    Seq2 = [rand:uniform() || _ <- lists:seq(1, 5)],

    ?assertEqual(Seq1, Seq2).

error_handling_test.erl

-module(error_handling_test).
-include_lib("eunit/include/eunit.hrl").

%% ============================================================================
%% Structured error tests
%% ============================================================================

mutation_error_structure_test() ->
    %% Errors should be structured tuples
    Error = {error, {selection_failed, no_operators}},
    {error, {Type, Reason}} = Error,
    ?assertEqual(selection_failed, Type),
    ?assertEqual(no_operators, Reason).

add_neuron_error_handling_test() ->
    %% Create agent with no splittable links
    {ok, AgentId} = test_helpers:create_minimal_agent(),

    Result = genome_mutator:add_neuron(AgentId),

    case Result of
        ok -> ok;
        {error, no_splittable_link} -> ok;
        Other -> ?assertEqual(expected_ok_or_error, Other)
    end,

    test_helpers:cleanup_test_agent(AgentId).

%% ============================================================================
%% Error recovery tests
%% ============================================================================

mutation_continues_after_error_test() ->
    %% Mutations should not crash the system
    {ok, AgentId} = test_helpers:create_test_agent_in_db(),

    %% Apply many mutations, some may fail
    Results = [
        genome_mutator:apply_mutation_safe(AgentId, add_neuron)
        || _ <- lists:seq(1, 20)
    ],

    %% Some may fail, but should get results for all
    ?assertEqual(20, length(Results)),

    test_helpers:cleanup_test_agent(AgentId).

Documentation Requirements

Required Documentation

  1. Error types and handling

    • All error types defined
    • Error handling patterns
    • Recovery strategies
  2. Logging usage

    • Log levels and when to use
    • Debug output format
  3. API migration guide

    • now() to timestamp
    • random to rand

Documentation Checklist

  • [ ] All error types defined with @type
  • [ ] Error handling guide written
  • [ ] Logging documentation complete
  • [ ] API migration documented
  • [ ] All bug fixes have tests

Quality Gates

v0.7.0 Acceptance Criteria

  1. Bug Fixes

    • [ ] All 3 identified bugs fixed
    • [ ] Each bug has regression test
    • [ ] Tests pass before and after fix
  2. API Updates

    • [ ] No usage of now()
    • [ ] No usage of random module
    • [ ] time_utils module created
  3. Error Handling

    • [ ] No exit() with strings
    • [ ] Structured error returns
    • [ ] Error types documented
  4. Code Quality

    • [ ] Logging in critical paths
    • [ ] Error handling specs complete
    • [ ] All tests pass
  5. Static Analysis

    • [ ] Zero dialyzer warnings
    • [ ] No deprecated API warnings

Known Limitations

  • Process crash recovery not complete (v0.8.0)
  • Timeout handling not complete (v0.8.0)
  • Advanced error recovery deferred

Next Steps

After v0.7.0 completion:

  1. v0.8.0 will add process safety (spawn_link, timeouts)
  2. Core bugs eliminated
  3. Modern APIs in use

Implementation Notes

Error Handling Pattern

%% Pattern for operations that may fail
do_operation(Args) ->
    case validate_args(Args) of
        {error, _} = Error ->
            Error;
        ok ->
            case perform_operation(Args) of
                {ok, Result} ->
                    {ok, Result};
                {error, Reason} ->
                    tweann_logger:warning("Operation failed: ~p", [Reason]),
                    {error, Reason}
            end
    end.

Time Migration Checklist

For each module:

  1. Search for now()
  2. Replace with time_utils:timestamp()
  3. Update time difference calculations
  4. Search for random:
  5. Replace with rand:
  6. Update seed operations

Dependencies

External Dependencies

  • OTP logger
  • OTP dbg (for tracing)

Internal Dependencies

  • All previously refactored modules
  • v0.1.0: types

Effort Estimate

TaskEstimate
Bug fixes with tests2 days
API migration2 days
Error handling2 days
Logging infrastructure1 day
Documentation1.5 days
Verification1.5 days
Total10 days

Risks

RiskMitigation
Bug fixes cause regressionsTests before fixing
API changes break behaviorThorough testing
Error handling overheadProfile critical paths

Version: 0.7.0 Phase: Robustness Status: Planned