Performance Optimization Guide
View SourceComplete guide to optimizing PcapFileEx performance for different file sizes and query patterns.
Decision Matrix: Choosing the Right Approach
| File Size | Query Type | Best Approach | Memory Usage | Speed |
|---|---|---|---|---|
| < 10MB | Read all | read_all/1 | High (loads all) | Fastest |
| < 10MB | Selective | read_all/1 + Filter | High | Fast |
| 10-100MB | Read all | stream/1 | Low (constant) | Fast |
| 10-100MB | Selective | stream/1 + Filter | Low | Medium |
| 100MB-1GB | Read all | stream/1 | Low | Medium |
| 100MB-1GB | Selective (<10%) | PreFilter + stream | Low | Fast |
| > 1GB | Read all | stream/1 | Low | Slow |
| > 1GB | Selective (<10%) | PreFilter + stream | Low | Fast |
| > 1GB | Selective (>10%) | stream/1 + Filter | Low | Slow |
PreFilter Performance
Benchmark Results
Real-world benchmarks on 10GB PCAP file with 50M packets:
Task: Find first 100 packets to port 443
Method 1 - Elixir Filter:
PcapFileEx.stream!("10gb.pcap")
|> Stream.filter(fn p -> p.dst.port == 443 end)
|> Enum.take(100)
Time: ~120 seconds
Memory: 50MB (constant)
Method 2 - PreFilter:
{:ok, r} = PcapFileEx.open("10gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(r, [PreFilter.port_dest(443)])
packets = PcapFileEx.Stream.from_reader(r) |> Enum.take(100)
PcapFileEx.Pcap.close(r)
Time: ~1.2 seconds (100x faster!)
Memory: 50MB (constant)When PreFilter Gives Maximum Speedup
✅ Best speedup scenarios:
- Large files (>100MB)
- Selective queries (<10% of packets)
- Simple criteria (IP, port, protocol)
- Early termination (take/1, find/1)
❌ Minimal speedup scenarios:
- Small files (<10MB) - overhead not worth it
- Reading most packets (>50%)
- Complex application logic needed
Streaming vs Eager Loading
Eager Loading (read_all/1)
{:ok, packets} = PcapFileEx.read_all("capture.pcap")Pros:
- Fastest for small files
- Simple API
- Can use Enum functions freely
- Random access to packets
Cons:
- Loads entire file into memory
- OOM risk for large files
- Slower startup for large files
Use when:
- File < 100MB
- Need random access
- Will process all packets
- Memory is not constrained
Streaming (stream/1)
PcapFileEx.stream!("capture.pcap")
|> Stream.filter(...)
|> Enum.to_list()Pros:
- Constant memory usage
- Works with files larger than RAM
- Can use Stream functions
- Automatic resource cleanup
Cons:
- Sequential access only
- Slightly slower per-packet overhead
- Must use Stream-aware functions
Use when:
- File > 100MB
- Only need subset of packets
- Memory is constrained
- Processing pipeline works with streams
Memory Management
Memory Usage Patterns
# HIGH memory - loads all
{:ok, packets} = PcapFileEx.read_all("10gb.pcap") # 10GB in RAM!
# LOW memory - constant usage
PcapFileEx.stream!("10gb.pcap")
|> Enum.each(fn packet -> process(packet) end) # ~50MB constant
# MEDIUM memory - accumulation
PcapFileEx.stream!("10gb.pcap")
|> Enum.to_list() # Eventually loads all, but gradually
# LOW memory - early termination
PcapFileEx.stream!("10gb.pcap")
|> Enum.take(1000) # Stops after 1000 packetsResource Cleanup
# ✅ AUTOMATIC cleanup (recommended)
PcapFileEx.stream!("file.pcap") |> Enum.to_list()
# ✅ MANUAL cleanup (advanced)
{:ok, reader} = PcapFileEx.open("file.pcap")
try do
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.take(100)
after
PcapFileEx.Pcap.close(reader) # Always executes
end
# ❌ LEAK - reader never closed!
{:ok, reader} = PcapFileEx.open("file.pcap")
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
# Missing close!Decode Performance
When to Disable Decoding
Decoding adds CPU overhead. Disable when you don't need protocol information:
# ✅ Disable decode for raw metrics
packet_count = PcapFileEx.stream!("large.pcap", decode: false)
|> Enum.count()
total_bytes = PcapFileEx.stream!("large.pcap", decode: false)
|> Stream.map(&byte_size(&1.data))
|> Enum.sum()
# Find timestamp range
{first_ts, last_ts} = PcapFileEx.stream!("large.pcap", decode: false)
|> Enum.reduce({nil, nil}, fn p, {first, _last} ->
{first || p.timestamp, p.timestamp}
end)
# ❌ Keep decode enabled when you need protocol info
http_packets = PcapFileEx.stream!("large.pcap") # decode: true (default)
|> Stream.filter(fn p -> :http in p.protocols end)
|> Enum.to_list()Decode Performance Impact
Benchmark: Processing 1M packets
With decode: true (default)
Time: 45 seconds
Provides: protocols, decoded payloads, endpoints
With decode: false
Time: 12 seconds (3.75x faster)
Provides: timestamp, data (raw bytes)Statistics Performance
Eager vs Streaming Statistics
# Small files (<100MB) - eager is faster
{:ok, stats} = PcapFileEx.Stats.compute("small.pcap")
# Memory: Loads all packets
# Speed: Fast startup, fast computation
# Large files (>100MB) - streaming is better
{:ok, stats} = PcapFileEx.Stats.compute_streaming("large.pcap")
# Memory: Constant (streaming)
# Speed: Slower per-packet, but works on huge files
# From existing stream
stats = PcapFileEx.stream!("file.pcap")
|> PcapFileEx.Filter.by_protocol(:tcp)
|> PcapFileEx.Stats.compute_from_stream()PreFilter Optimization Techniques
Combining Filters for Maximum Performance
# ✅ GOOD: Specific filters reduce packets early
:ok = PcapFileEx.Pcap.set_filter(reader, [
PreFilter.protocol("tcp"), # Eliminates UDP, ICMP, etc.
PreFilter.port_dest(443), # Only port 443
PreFilter.ip_source_cidr("10.0.0.0/8") # Only internal IPs
])
# Result: Very few packets pass all filters
# ⚠️ OKAY: Broad filters
:ok = PcapFileEx.Pcap.set_filter(reader, [
PreFilter.protocol("tcp") # Still many packets
])
# ❌ INEFFICIENT: Too many matches (use Elixir Filter instead)
:ok = PcapFileEx.Pcap.set_filter(reader, [
PreFilter.any([
PreFilter.protocol("tcp"),
PreFilter.protocol("udp"),
PreFilter.protocol("icmp")
])
])
# Most packets match! PreFilter overhead not worth it.OR vs AND Semantics
# AND semantics (all must match)
PreFilter.all([
PreFilter.protocol("tcp"),
PreFilter.port_dest(80)
])
# Packet must be TCP AND destination port 80
# OR semantics (any can match)
PreFilter.any([
PreFilter.port_dest(80),
PreFilter.port_dest(443),
PreFilter.port_dest(8080)
])
# Packet can have ANY of these destination portsClearing Filters
# Set filter
:ok = PcapFileEx.Pcap.set_filter(reader, [...])
# Clear filter (back to all packets)
:ok = PcapFileEx.Pcap.clear_filter(reader)Common Performance Anti-Patterns
❌ Anti-Pattern 1: Loading Large Files Eagerly
# DON'T: Load 10GB file into memory
{:ok, packets} = PcapFileEx.read_all("huge_10gb.pcap")
tcp_packets = Enum.filter(packets, fn p -> :tcp in p.protocols end)
# DO: Stream instead
tcp_packets = PcapFileEx.stream!("huge_10gb.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols end)
|> Enum.to_list()
# BETTER: Use PreFilter if selective
{:ok, reader} = PcapFileEx.open("huge_10gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [PreFilter.protocol("tcp")])
tcp_packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
PcapFileEx.Pcap.close(reader)❌ Anti-Pattern 2: Multiple Passes Over Large Files
# DON'T: Read file multiple times
tcp_count = PcapFileEx.stream!("huge.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols end)
|> Enum.count()
udp_count = PcapFileEx.stream!("huge.pcap") # Re-reads entire file!
|> Stream.filter(fn p -> :udp in p.protocols end)
|> Enum.count()
# DO: Single pass with accumulator
{tcp_count, udp_count} = PcapFileEx.stream!("huge.pcap")
|> Enum.reduce({0, 0}, fn packet, {tcp, udp} ->
cond do
:tcp in packet.protocols -> {tcp + 1, udp}
:udp in packet.protocols -> {tcp, udp + 1}
true -> {tcp, udp}
end
end)❌ Anti-Pattern 3: Unnecessary Decoding
# DON'T: Decode when you only need size
sizes = PcapFileEx.stream!("large.pcap") # decode: true (default)
|> Stream.map(&byte_size(&1.data))
|> Enum.to_list()
# DO: Disable decode
sizes = PcapFileEx.stream!("large.pcap", decode: false)
|> Stream.map(&byte_size(&1.data))
|> Enum.to_list()❌ Anti-Pattern 4: Converting Stream to List Too Early
# DON'T: Lose streaming benefits
packets = PcapFileEx.stream!("huge.pcap") |> Enum.to_list() # Loads all!
first_http = Enum.find(packets, fn p -> :http in p.protocols end)
# DO: Keep streaming
first_http = PcapFileEx.stream!("huge.pcap")
|> Enum.find(fn p -> :http in p.protocols end) # Stops at first matchPerformance Checklist
Before processing a PCAP file, ask:
How large is the file?
- < 100MB → Consider
read_all/1 100MB → Use
stream/1
- < 100MB → Consider
Do I need all packets?
- Yes → Stream or read_all
- No (<10%) → Use PreFilter
Do I need protocol information?
- Yes → Keep
decode: true(default) - No → Use
decode: false
- Yes → Keep
Is my filter simple?
- Yes (IP/port/protocol) → Use PreFilter
- No (complex logic) → Use Elixir Filter
Will I process packets once or multiple times?
- Once → Streaming is fine
- Multiple times → Consider read_all (if file is small)
Do I need resource cleanup?
- Automatic → Use
stream/1 - Manual → Use
open/closewith try/after
- Automatic → Use
Real-World Performance Examples
Example 1: Finding Specific HTTP Requests
# Task: Find first 10 GET requests to /api/* in 5GB file
# ❌ SLOW (150 seconds)
PcapFileEx.stream!("5gb.pcap")
|> Stream.filter(fn p ->
:http in p.protocols and
p.decoded[:http].method == "GET" and
String.starts_with?(p.decoded[:http].path || "", "/api/")
end)
|> Enum.take(10)
# ✅ FAST (5 seconds)
{:ok, reader} = PcapFileEx.open("5gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
PreFilter.protocol("tcp"),
PreFilter.port_dest(80)
])
packets = PcapFileEx.Stream.from_reader!(reader)
|> Stream.filter(fn p ->
:http in p.protocols and
p.decoded[:http].method == "GET" and
String.starts_with?(p.decoded[:http].path || "", "/api/")
end)
|> Enum.take(10)
PcapFileEx.Pcap.close(reader)Example 2: Computing Statistics on Large File
# Task: Get protocol breakdown of 20GB file
# ❌ MEMORY ERROR
{:ok, packets} = PcapFileEx.read_all("20gb.pcap") # OOM!
# ✅ WORKS (constant memory)
{:ok, stats} = PcapFileEx.Stats.compute_streaming("20gb.pcap")
IO.inspect(stats.protocols)Example 3: Extracting Subset of Packets
# Task: Extract all HTTPS traffic from 10GB file to new file
# ❌ SLOW (uses Elixir filtering)
PcapFileEx.stream!("10gb.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols and p.dst.port == 443 end)
|> Stream.map(& &1.data)
# ... write to new file ...
# ✅ FAST (uses PreFilter - 50x faster)
{:ok, reader} = PcapFileEx.open("10gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
PreFilter.protocol("tcp"),
PreFilter.port_dest(443)
])
PcapFileEx.Stream.from_reader!(reader)
|> Stream.map(& &1.data)
# ... write to new file ...
PcapFileEx.Pcap.close(reader)Summary: Performance Best Practices
- ✅ Use auto-detection (
PcapFileEx.open/1) - ✅ Use PreFilter for large files + selective queries (10-100x speedup)
- ✅ Use streaming for files > 100MB
- ✅ Disable decode when you don't need protocol info (3-4x speedup)
- ✅ Use streaming statistics for large files
- ✅ Single-pass processing when possible
- ✅ Automatic resource cleanup with
stream/1 - ❌ Don't load huge files with
read_all/1 - ❌ Don't use Elixir filtering on large files for simple criteria
- ❌ Don't convert streams to lists unnecessarily