Memory Management
View SourceThis guide covers Python memory monitoring and garbage collection from Erlang.
Memory Statistics
Get current Python memory statistics:
{ok, Stats} = py:memory_stats().The returned map contains:
gc_stats- List of per-generation statistics (collected, collections, uncollectable)gc_count- Tuple of object counts per generation{gen0, gen1, gen2}gc_threshold- Collection thresholds per generation
Example output:
#{gc_stats =>
[#{<<"collected">> => 0, <<"collections">> => 0, <<"uncollectable">> => 0},
#{<<"collected">> => 0, <<"collections">> => 0, <<"uncollectable">> => 0},
#{<<"collected">> => 145, <<"collections">> => 1, <<"uncollectable">> => 0}],
gc_count => {1837, 0, 0},
gc_threshold => {2000, 10, 0}}Garbage Collection
Manual Collection
Force Python garbage collection:
%% Full collection (all generations)
{ok, Collected} = py:gc().
%% Collection by generation
{ok, _} = py:gc(0). %% Youngest objects only
{ok, _} = py:gc(1). %% Generations 0 and 1
{ok, _} = py:gc(2). %% Full collectionWhen to Force GC
- After processing large datasets
- Before measuring memory usage
- When memory pressure is detected
Memory Tracing
For detailed memory debugging, use tracemalloc:
%% Start tracing
ok = py:tracemalloc_start().
%% Do some work
{ok, _} = py:eval(<<"[x**2 for x in range(100000)]">>).
%% Check memory
{ok, Stats} = py:memory_stats().
%% Stats now includes:
%% traced_memory_current => 1234567, %% Current bytes
%% traced_memory_peak => 2345678 %% Peak bytes
%% Stop tracing
ok = py:tracemalloc_stop().Frame Depth
For more detailed tracebacks, specify frame depth:
ok = py:tracemalloc_start(10). %% Store 10 frames per allocationHigher frame counts provide more detail but use more memory.
Memory Best Practices
1. Use Streaming for Large Data
Instead of loading everything into memory:
%% Bad - loads entire list
{ok, Huge} = py:eval(<<"list(range(10000000))">>).
%% Good - processes incrementally
{ok, Chunks} = py:stream_eval(<<"(x for x in range(10000000))">>).2. Clear Large Objects
Python objects are cleaned up when their references are released. Force cleanup with explicit GC:
process_large_data(Data) ->
Result = py:call(processor, handle, [Data]),
{ok, _} = py:gc(), %% Clean up Python side
Result.3. Monitor Pool Memory
Track memory across workers:
monitor_memory() ->
{ok, Stats} = py:memory_stats(),
Count = element(1, maps:get(gc_count, Stats)),
Threshold = element(1, maps:get(gc_threshold, Stats)),
if Count > Threshold * 0.8 ->
logger:warning("Python memory pressure: ~p/~p", [Count, Threshold]),
py:gc();
true ->
ok
end.Understanding GC Stats
Generations
Python uses generational garbage collection:
- Generation 0: Newly created objects. Collected frequently.
- Generation 1: Objects that survived one collection. Collected less often.
- Generation 2: Long-lived objects. Collected rarely.
Thresholds
Default thresholds are {700, 10, 10}:
- Gen 0 collects after 700 new allocations
- Gen 1 collects after 10 Gen 0 collections
- Gen 2 collects after 10 Gen 1 collections
Uncollectable Objects
Objects with circular references and __del__ methods may be uncollectable.
Monitor the uncollectable count in gc_stats.
Troubleshooting
High Memory Usage
- Enable tracemalloc to identify allocations
- Check for large objects not being released
- Force GC and re-measure
- Consider streaming large datasets
Memory Leaks
- Check
uncollectablecount in gc_stats - Look for circular references in Python code
- Ensure generators are fully consumed or explicitly closed
Worker Memory Growth
Each worker maintains its own namespace. Objects defined via exec persist:
%% This grows worker memory over time
[py:exec(<<"x", N, " = [0] * 1000000">>) || N <- lists:seq(1, 100)].
%% Consider using eval with locals instead
[py:eval(<<"len(data)">>, #{data => LargeList}) || _ <- lists:seq(1, 100)].