Memory Management

View Source

This guide covers Python memory monitoring and garbage collection from Erlang.

Memory Statistics

Get current Python memory statistics:

{ok, Stats} = py:memory_stats().

The returned map contains:

  • gc_stats - List of per-generation statistics (collected, collections, uncollectable)
  • gc_count - Tuple of object counts per generation {gen0, gen1, gen2}
  • gc_threshold - Collection thresholds per generation

Example output:

#{gc_stats =>
    [#{<<"collected">> => 0, <<"collections">> => 0, <<"uncollectable">> => 0},
     #{<<"collected">> => 0, <<"collections">> => 0, <<"uncollectable">> => 0},
     #{<<"collected">> => 145, <<"collections">> => 1, <<"uncollectable">> => 0}],
  gc_count => {1837, 0, 0},
  gc_threshold => {2000, 10, 0}}

Garbage Collection

Manual Collection

Force Python garbage collection:

%% Full collection (all generations)
{ok, Collected} = py:gc().

%% Collection by generation
{ok, _} = py:gc(0).   %% Youngest objects only
{ok, _} = py:gc(1).   %% Generations 0 and 1
{ok, _} = py:gc(2).   %% Full collection

When to Force GC

  • After processing large datasets
  • Before measuring memory usage
  • When memory pressure is detected

Memory Tracing

For detailed memory debugging, use tracemalloc:

%% Start tracing
ok = py:tracemalloc_start().

%% Do some work
{ok, _} = py:eval(<<"[x**2 for x in range(100000)]">>).

%% Check memory
{ok, Stats} = py:memory_stats().
%% Stats now includes:
%%   traced_memory_current => 1234567,  %% Current bytes
%%   traced_memory_peak => 2345678      %% Peak bytes

%% Stop tracing
ok = py:tracemalloc_stop().

Frame Depth

For more detailed tracebacks, specify frame depth:

ok = py:tracemalloc_start(10).  %% Store 10 frames per allocation

Higher frame counts provide more detail but use more memory.

Memory Best Practices

1. Use Streaming for Large Data

Instead of loading everything into memory:

%% Bad - loads entire list
{ok, Huge} = py:eval(<<"list(range(10000000))">>).

%% Good - processes incrementally
{ok, Chunks} = py:stream_eval(<<"(x for x in range(10000000))">>).

2. Clear Large Objects

Python objects are cleaned up when their references are released. Force cleanup with explicit GC:

process_large_data(Data) ->
    Result = py:call(processor, handle, [Data]),
    {ok, _} = py:gc(),  %% Clean up Python side
    Result.

3. Monitor Pool Memory

Track memory across workers:

monitor_memory() ->
    {ok, Stats} = py:memory_stats(),
    Count = element(1, maps:get(gc_count, Stats)),
    Threshold = element(1, maps:get(gc_threshold, Stats)),
    if Count > Threshold * 0.8 ->
        logger:warning("Python memory pressure: ~p/~p", [Count, Threshold]),
        py:gc();
       true ->
        ok
    end.

Understanding GC Stats

Generations

Python uses generational garbage collection:

  • Generation 0: Newly created objects. Collected frequently.
  • Generation 1: Objects that survived one collection. Collected less often.
  • Generation 2: Long-lived objects. Collected rarely.

Thresholds

Default thresholds are {700, 10, 10}:

  • Gen 0 collects after 700 new allocations
  • Gen 1 collects after 10 Gen 0 collections
  • Gen 2 collects after 10 Gen 1 collections

Uncollectable Objects

Objects with circular references and __del__ methods may be uncollectable. Monitor the uncollectable count in gc_stats.

Troubleshooting

High Memory Usage

  1. Enable tracemalloc to identify allocations
  2. Check for large objects not being released
  3. Force GC and re-measure
  4. Consider streaming large datasets

Memory Leaks

  1. Check uncollectable count in gc_stats
  2. Look for circular references in Python code
  3. Ensure generators are fully consumed or explicitly closed

Worker Memory Growth

Each worker maintains its own namespace. Objects defined via exec persist:

%% This grows worker memory over time
[py:exec(<<"x", N, " = [0] * 1000000">>) || N <- lists:seq(1, 100)].

%% Consider using eval with locals instead
[py:eval(<<"len(data)">>, #{data => LargeList}) || _ <- lists:seq(1, 100)].