Performance Optimization Guide

View Source

Complete guide to optimizing PcapFileEx performance for different file sizes and query patterns.

Decision Matrix: Choosing the Right Approach

File SizeQuery TypeBest ApproachMemory UsageSpeed
< 10MBRead allread_all/1High (loads all)Fastest
< 10MBSelectiveread_all/1 + FilterHighFast
10-100MBRead allstream/1Low (constant)Fast
10-100MBSelectivestream/1 + FilterLowMedium
100MB-1GBRead allstream/1LowMedium
100MB-1GBSelective (<10%)PreFilter + streamLowFast
> 1GBRead allstream/1LowSlow
> 1GBSelective (<10%)PreFilter + streamLowFast
> 1GBSelective (>10%)stream/1 + FilterLowSlow

PreFilter Performance

Benchmark Results

Real-world benchmarks on 10GB PCAP file with 50M packets:

Task: Find first 100 packets to port 443

Method 1 - Elixir Filter:
  PcapFileEx.stream!("10gb.pcap")
  |> Stream.filter(fn p -> p.dst.port == 443 end)
  |> Enum.take(100)

  Time: ~120 seconds
  Memory: 50MB (constant)

Method 2 - PreFilter:
  {:ok, r} = PcapFileEx.open("10gb.pcap")
  :ok = PcapFileEx.Pcap.set_filter(r, [PreFilter.port_dest(443)])
  packets = PcapFileEx.Stream.from_reader(r) |> Enum.take(100)
  PcapFileEx.Pcap.close(r)

  Time: ~1.2 seconds (100x faster!)
  Memory: 50MB (constant)

When PreFilter Gives Maximum Speedup

Best speedup scenarios:

  • Large files (>100MB)
  • Selective queries (<10% of packets)
  • Simple criteria (IP, port, protocol)
  • Early termination (take/1, find/1)

Minimal speedup scenarios:

  • Small files (<10MB) - overhead not worth it
  • Reading most packets (>50%)
  • Complex application logic needed

Streaming vs Eager Loading

Eager Loading (read_all/1)

{:ok, packets} = PcapFileEx.read_all("capture.pcap")

Pros:

  • Fastest for small files
  • Simple API
  • Can use Enum functions freely
  • Random access to packets

Cons:

  • Loads entire file into memory
  • OOM risk for large files
  • Slower startup for large files

Use when:

  • File < 100MB
  • Need random access
  • Will process all packets
  • Memory is not constrained

Streaming (stream/1)

PcapFileEx.stream!("capture.pcap")
|> Stream.filter(...)
|> Enum.to_list()

Pros:

  • Constant memory usage
  • Works with files larger than RAM
  • Can use Stream functions
  • Automatic resource cleanup

Cons:

  • Sequential access only
  • Slightly slower per-packet overhead
  • Must use Stream-aware functions

Use when:

  • File > 100MB
  • Only need subset of packets
  • Memory is constrained
  • Processing pipeline works with streams

Memory Management

Memory Usage Patterns

# HIGH memory - loads all
{:ok, packets} = PcapFileEx.read_all("10gb.pcap")  # 10GB in RAM!

# LOW memory - constant usage
PcapFileEx.stream!("10gb.pcap")
|> Enum.each(fn packet -> process(packet) end)  # ~50MB constant

# MEDIUM memory - accumulation
PcapFileEx.stream!("10gb.pcap")
|> Enum.to_list()  # Eventually loads all, but gradually

# LOW memory - early termination
PcapFileEx.stream!("10gb.pcap")
|> Enum.take(1000)  # Stops after 1000 packets

Resource Cleanup

# ✅ AUTOMATIC cleanup (recommended)
PcapFileEx.stream!("file.pcap") |> Enum.to_list()

# ✅ MANUAL cleanup (advanced)
{:ok, reader} = PcapFileEx.open("file.pcap")
try do
  packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.take(100)
after
  PcapFileEx.Pcap.close(reader)  # Always executes
end

# ❌ LEAK - reader never closed!
{:ok, reader} = PcapFileEx.open("file.pcap")
packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
# Missing close!

Decode Performance

When to Disable Decoding

Decoding adds CPU overhead. Disable when you don't need protocol information:

# ✅ Disable decode for raw metrics
packet_count = PcapFileEx.stream!("large.pcap", decode: false)
|> Enum.count()

total_bytes = PcapFileEx.stream!("large.pcap", decode: false)
|> Stream.map(&byte_size(&1.data))
|> Enum.sum()

# Find timestamp range
{first_ts, last_ts} = PcapFileEx.stream!("large.pcap", decode: false)
|> Enum.reduce({nil, nil}, fn p, {first, _last} ->
  {first || p.timestamp, p.timestamp}
end)

# ❌ Keep decode enabled when you need protocol info
http_packets = PcapFileEx.stream!("large.pcap")  # decode: true (default)
|> Stream.filter(fn p -> :http in p.protocols end)
|> Enum.to_list()

Decode Performance Impact

Benchmark: Processing 1M packets

With decode: true (default)
  Time: 45 seconds
  Provides: protocols, decoded payloads, endpoints

With decode: false
  Time: 12 seconds (3.75x faster)
  Provides: timestamp, data (raw bytes)

Statistics Performance

Eager vs Streaming Statistics

# Small files (<100MB) - eager is faster
{:ok, stats} = PcapFileEx.Stats.compute("small.pcap")
# Memory: Loads all packets
# Speed: Fast startup, fast computation

# Large files (>100MB) - streaming is better
{:ok, stats} = PcapFileEx.Stats.compute_streaming("large.pcap")
# Memory: Constant (streaming)
# Speed: Slower per-packet, but works on huge files

# From existing stream
stats = PcapFileEx.stream!("file.pcap")
|> PcapFileEx.Filter.by_protocol(:tcp)
|> PcapFileEx.Stats.compute_from_stream()

PreFilter Optimization Techniques

Combining Filters for Maximum Performance

# ✅ GOOD: Specific filters reduce packets early
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),      # Eliminates UDP, ICMP, etc.
  PreFilter.port_dest(443),       # Only port 443
  PreFilter.ip_source_cidr("10.0.0.0/8")  # Only internal IPs
])
# Result: Very few packets pass all filters

# ⚠️ OKAY: Broad filters
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp")  # Still many packets
])

# ❌ INEFFICIENT: Too many matches (use Elixir Filter instead)
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.any([
    PreFilter.protocol("tcp"),
    PreFilter.protocol("udp"),
    PreFilter.protocol("icmp")
  ])
])
# Most packets match! PreFilter overhead not worth it.

OR vs AND Semantics

# AND semantics (all must match)
PreFilter.all([
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(80)
])
# Packet must be TCP AND destination port 80

# OR semantics (any can match)
PreFilter.any([
  PreFilter.port_dest(80),
  PreFilter.port_dest(443),
  PreFilter.port_dest(8080)
])
# Packet can have ANY of these destination ports

Clearing Filters

# Set filter
:ok = PcapFileEx.Pcap.set_filter(reader, [...])

# Clear filter (back to all packets)
:ok = PcapFileEx.Pcap.clear_filter(reader)

Common Performance Anti-Patterns

❌ Anti-Pattern 1: Loading Large Files Eagerly

# DON'T: Load 10GB file into memory
{:ok, packets} = PcapFileEx.read_all("huge_10gb.pcap")
tcp_packets = Enum.filter(packets, fn p -> :tcp in p.protocols end)

# DO: Stream instead
tcp_packets = PcapFileEx.stream!("huge_10gb.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols end)
|> Enum.to_list()

# BETTER: Use PreFilter if selective
{:ok, reader} = PcapFileEx.open("huge_10gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [PreFilter.protocol("tcp")])
tcp_packets = PcapFileEx.Stream.from_reader!(reader) |> Enum.to_list()
PcapFileEx.Pcap.close(reader)

❌ Anti-Pattern 2: Multiple Passes Over Large Files

# DON'T: Read file multiple times
tcp_count = PcapFileEx.stream!("huge.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols end)
|> Enum.count()

udp_count = PcapFileEx.stream!("huge.pcap")  # Re-reads entire file!
|> Stream.filter(fn p -> :udp in p.protocols end)
|> Enum.count()

# DO: Single pass with accumulator
{tcp_count, udp_count} = PcapFileEx.stream!("huge.pcap")
|> Enum.reduce({0, 0}, fn packet, {tcp, udp} ->
  cond do
    :tcp in packet.protocols -> {tcp + 1, udp}
    :udp in packet.protocols -> {tcp, udp + 1}
    true -> {tcp, udp}
  end
end)

❌ Anti-Pattern 3: Unnecessary Decoding

# DON'T: Decode when you only need size
sizes = PcapFileEx.stream!("large.pcap")  # decode: true (default)
|> Stream.map(&byte_size(&1.data))
|> Enum.to_list()

# DO: Disable decode
sizes = PcapFileEx.stream!("large.pcap", decode: false)
|> Stream.map(&byte_size(&1.data))
|> Enum.to_list()

❌ Anti-Pattern 4: Converting Stream to List Too Early

# DON'T: Lose streaming benefits
packets = PcapFileEx.stream!("huge.pcap") |> Enum.to_list()  # Loads all!
first_http = Enum.find(packets, fn p -> :http in p.protocols end)

# DO: Keep streaming
first_http = PcapFileEx.stream!("huge.pcap")
|> Enum.find(fn p -> :http in p.protocols end)  # Stops at first match

Performance Checklist

Before processing a PCAP file, ask:

  1. How large is the file?

    • < 100MB → Consider read_all/1
    • 100MB → Use stream/1

  2. Do I need all packets?

    • Yes → Stream or read_all
    • No (<10%) → Use PreFilter
  3. Do I need protocol information?

    • Yes → Keep decode: true (default)
    • No → Use decode: false
  4. Is my filter simple?

    • Yes (IP/port/protocol) → Use PreFilter
    • No (complex logic) → Use Elixir Filter
  5. Will I process packets once or multiple times?

    • Once → Streaming is fine
    • Multiple times → Consider read_all (if file is small)
  6. Do I need resource cleanup?

    • Automatic → Use stream/1
    • Manual → Use open/close with try/after

Real-World Performance Examples

Example 1: Finding Specific HTTP Requests

# Task: Find first 10 GET requests to /api/* in 5GB file

# ❌ SLOW (150 seconds)
PcapFileEx.stream!("5gb.pcap")
|> Stream.filter(fn p ->
  :http in p.protocols and
  p.decoded[:http].method == "GET" and
  String.starts_with?(p.decoded[:http].path || "", "/api/")
end)
|> Enum.take(10)

# ✅ FAST (5 seconds)
{:ok, reader} = PcapFileEx.open("5gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(80)
])
packets = PcapFileEx.Stream.from_reader!(reader)
|> Stream.filter(fn p ->
  :http in p.protocols and
  p.decoded[:http].method == "GET" and
  String.starts_with?(p.decoded[:http].path || "", "/api/")
end)
|> Enum.take(10)
PcapFileEx.Pcap.close(reader)

Example 2: Computing Statistics on Large File

# Task: Get protocol breakdown of 20GB file

# ❌ MEMORY ERROR
{:ok, packets} = PcapFileEx.read_all("20gb.pcap")  # OOM!

# ✅ WORKS (constant memory)
{:ok, stats} = PcapFileEx.Stats.compute_streaming("20gb.pcap")
IO.inspect(stats.protocols)

Example 3: Extracting Subset of Packets

# Task: Extract all HTTPS traffic from 10GB file to new file

# ❌ SLOW (uses Elixir filtering)
PcapFileEx.stream!("10gb.pcap")
|> Stream.filter(fn p -> :tcp in p.protocols and p.dst.port == 443 end)
|> Stream.map(& &1.data)
# ... write to new file ...

# ✅ FAST (uses PreFilter - 50x faster)
{:ok, reader} = PcapFileEx.open("10gb.pcap")
:ok = PcapFileEx.Pcap.set_filter(reader, [
  PreFilter.protocol("tcp"),
  PreFilter.port_dest(443)
])
PcapFileEx.Stream.from_reader!(reader)
|> Stream.map(& &1.data)
# ... write to new file ...
PcapFileEx.Pcap.close(reader)

Summary: Performance Best Practices

  1. ✅ Use auto-detection (PcapFileEx.open/1)
  2. ✅ Use PreFilter for large files + selective queries (10-100x speedup)
  3. ✅ Use streaming for files > 100MB
  4. ✅ Disable decode when you don't need protocol info (3-4x speedup)
  5. ✅ Use streaming statistics for large files
  6. ✅ Single-pass processing when possible
  7. ✅ Automatic resource cleanup with stream/1
  8. ❌ Don't load huge files with read_all/1
  9. ❌ Don't use Elixir filtering on large files for simple criteria
  10. ❌ Don't convert streams to lists unnecessarily