Batch Processing Guide
Copy MarkdownThis guide covers the batch API for encoding and decoding multiple items in a single NIF call.
When to Use Batch Operations
Batch operations are useful when you need to process many separate strings or binaries:
- Decoding/encoding rows from a database
- Processing lists of filenames or paths
- Converting multiple user inputs
- Data migration tasks
For streaming a single large file, use EncodingRs.Decoder instead (see the Streaming Guide).
The Problem
Each NIF call has overhead: scheduler context switching, argument marshalling, and result conversion. When processing many small items, this overhead can dominate:
# Inefficient: 1000 NIF calls
items
|> Enum.map(fn {data, encoding} ->
EncodingRs.decode(data, encoding)
end)The Solution
Batch operations process all items in a single NIF call, amortizing the dispatch overhead:
# Efficient: 1 NIF call
EncodingRs.decode_batch(items)Usage
Decoding Multiple Binaries
items = [
{<<72, 101, 108, 108, 111>>, "windows-1252"},
{<<0x82, 0xA0>>, "shift_jis"},
{<<0xC4, 0xE3, 0xBA, 0xC3>>, "gbk"}
]
results = EncodingRs.decode_batch(items)
# => [{:ok, "Hello"}, {:ok, "あ"}, {:ok, "你好"}]Encoding Multiple Strings
items = [
{"Hello", "windows-1252"},
{"あ", "shift_jis"},
{"你好", "gbk"}
]
results = EncodingRs.encode_batch(items)
# => [{:ok, <<72, 101, 108, 108, 111>>}, {:ok, <<130, 160>>}, {:ok, <<196, 227, 186, 195>>}]Handling Errors
Results are returned in the same order as input. Check each result individually:
items = [
{"Hello", "windows-1252"},
{"Test", "invalid-encoding"},
{"World", "utf-8"}
]
results = EncodingRs.encode_batch(items)
# => [{:ok, "Hello"}, {:error, :unknown_encoding}, {:ok, "World"}]
# Process results
Enum.zip(items, results)
|> Enum.each(fn {{input, encoding}, result} ->
case result do
{:ok, encoded} ->
IO.puts("Encoded #{inspect(input)} to #{encoding}")
{:error, reason} ->
IO.puts("Failed to encode #{inspect(input)}: #{reason}")
end
end)Mixed Encodings
Batch operations support different encodings per item:
# Database rows with encoding metadata
rows = [
%{content: <<...>>, encoding: "shift_jis", id: 1},
%{content: <<...>>, encoding: "gbk", id: 2},
%{content: <<...>>, encoding: "windows-1252", id: 3}
]
items = Enum.map(rows, &{&1.content, &1.encoding})
results = EncodingRs.decode_batch(items)
# Combine results back with original data
Enum.zip(rows, results)
|> Enum.map(fn {row, {:ok, decoded}} ->
Map.put(row, :content_utf8, decoded)
end)Dirty Scheduler Behavior
Batch operations always use dirty CPU schedulers, regardless of input size or item count.
Rationale
Batch operations are typically used for throughput-focused workloads where:
- Total work is significant - Even if individual items are small, processing many items adds up
- Predictability matters - Consistent dirty scheduler usage avoids variable latency
- Simplicity - No threshold logic to tune or understand
Trade-offs
| Aspect | Batch (always dirty) | Single-item (threshold-based) |
|---|---|---|
| Small workloads | Slight overhead from dirty scheduler | Uses normal scheduler |
| Large workloads | Optimal | Optimal |
| Latency | Consistent | Variable based on size |
| Complexity | Simple | Requires threshold tuning |
When This Matters
For most use cases, always using dirty schedulers is the right choice. The overhead is minimal and the behavior is predictable.
If you have a latency-sensitive application processing very small batches (< 10 items, each < 1KB), you may see slightly better latency using individual decode/2 or encode/2 calls, which respect the configured dirty threshold.
Known Limitations
No Batch Streaming
The batch API is for one-shot processing of complete binaries only. It does not support stateful streaming decoding where characters may be split across chunk boundaries.
For streaming use cases, use EncodingRs.Decoder which maintains state between chunks. However, each decoder handles a single stream - there is currently no way to batch process chunks from multiple streams in a single NIF call.
If you need to process multiple streams concurrently, create separate EncodingRs.Decoder instances for each stream.
Future Options
The following options may be added in future versions based on user feedback:
- Batch streaming - Process chunks from multiple decoders in a single NIF call
- Threshold-based routing - Check total bytes and route to normal/dirty scheduler
- Item count threshold - Use dirty scheduler only above N items
- Explicit scheduler choice -
decode_batch/2with options like[scheduler: :normal]
If you have a use case that would benefit from these options, please open an issue.
Performance Tips
Batch similar-sized items - Helps with memory allocation efficiency
Reasonable batch sizes - Batches of 100-10,000 items work well. Extremely large batches (100K+) may cause memory pressure.
Consider chunking very large lists:
large_list |> Enum.chunk_every(1000) |> Enum.flat_map(&EncodingRs.decode_batch/1)Parallel batches - For very large workloads, split across processes:
items |> Enum.chunk_every(1000) |> Task.async_stream(&EncodingRs.decode_batch/1, max_concurrency: 4) |> Enum.flat_map(fn {:ok, results} -> results end)
Comparison: Batch vs Streaming vs One-Shot
| Scenario | Best Approach |
|---|---|
| Single small binary | EncodingRs.decode/2 |
| Single large file | EncodingRs.Decoder.stream/2 |
| Many separate items | EncodingRs.decode_batch/1 |
| Network stream | EncodingRs.Decoder |
| Database rows | EncodingRs.decode_batch/1 |