Batch extraction operations for processing multiple documents efficiently.
This module provides functions for extracting content from multiple files or binary inputs in batch operations, which can be more efficient than processing files individually when dealing with large numbers of documents.
Summary
Functions
Extract content from multiple binary inputs in a batch operation.
Extract content from multiple binary inputs, raising on error.
Extract content from multiple files in a batch operation.
Extract content from multiple files, raising on error.
Functions
@spec batch_extract_bytes( [binary()], String.t() | [String.t()], Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil ) :: {:ok, [Kreuzberg.ExtractionResult.t()]} | {:error, String.t()}
Extract content from multiple binary inputs in a batch operation.
Parameters
data_list- List of binary data inputsmime_types- List of MIME types (one per input) or single MIME type for allconfig- ExtractionConfig struct or map with extraction options (optional)
Returns
{:ok, results}- List of ExtractionResult structs{:error, reason}- Error message if batch extraction fails
Examples
# Extract multiple PDFs from binary data
data_list = [pdf_binary1, pdf_binary2, pdf_binary3]
mime_types = ["application/pdf", "application/pdf", "application/pdf"]
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_bytes(data_list, mime_types)
# Use single MIME type for all inputs
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_bytes(data_list, "application/pdf")
# With config
config = %Kreuzberg.ExtractionConfig{ocr: %{"enabled" => true}}
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_bytes(data_list, mime_types, config)
@spec batch_extract_bytes!( [binary()], String.t() | [String.t()], Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil ) :: [Kreuzberg.ExtractionResult.t()]
Extract content from multiple binary inputs, raising on error.
Same as batch_extract_bytes/3 but raises a Kreuzberg.Error exception if extraction fails.
Examples
data_list = [pdf_binary1, pdf_binary2, pdf_binary3]
results = Kreuzberg.BatchAPI.batch_extract_bytes!(data_list, "application/pdf")
@spec batch_extract_files( [String.t() | Path.t()], String.t() | nil, Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil ) :: {:ok, [Kreuzberg.ExtractionResult.t()]} | {:error, String.t()}
Extract content from multiple files in a batch operation.
Parameters
paths- List of file paths (strings or Path.t())mime_type- MIME type for all files (optional, defaults to nil for auto-detection)config- ExtractionConfig struct or map with extraction options (optional)
Returns
{:ok, results}- List of ExtractionResult structs{:error, reason}- Error message if batch extraction fails
Examples
# Extract multiple PDFs
paths = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_files(paths, "application/pdf")
# Extract with config
config = %Kreuzberg.ExtractionConfig{images: %{"enabled" => true}}
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_files(paths, "application/pdf", config)
# Auto-detect MIME types
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_files(paths)
@spec batch_extract_files!( [String.t() | Path.t()], String.t() | nil, Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil ) :: [Kreuzberg.ExtractionResult.t()]
Extract content from multiple files, raising on error.
Same as batch_extract_files/3 but raises a Kreuzberg.Error exception if extraction fails.
Examples
paths = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = Kreuzberg.BatchAPI.batch_extract_files!(paths, "application/pdf")