Kreuzberg.BatchAPI (kreuzberg v4.4.2)

Copy Markdown View Source

Batch extraction operations for processing multiple documents efficiently.

This module provides functions for extracting content from multiple files or binary inputs in batch operations, which can be more efficient than processing files individually when dealing with large numbers of documents.

Summary

Functions

Extract content from multiple binary inputs in a batch operation.

Extract content from multiple binary inputs, raising on error.

Extract content from multiple files in a batch operation.

Extract content from multiple files, raising on error.

Functions

batch_extract_bytes(data_list, mime_types, config \\ nil)

@spec batch_extract_bytes(
  [binary()],
  String.t() | [String.t()],
  Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil
) :: {:ok, [Kreuzberg.ExtractionResult.t()]} | {:error, String.t()}

Extract content from multiple binary inputs in a batch operation.

Parameters

  • data_list - List of binary data inputs
  • mime_types - List of MIME types (one per input) or single MIME type for all
  • config - ExtractionConfig struct or map with extraction options (optional)

Returns

  • {:ok, results} - List of ExtractionResult structs
  • {:error, reason} - Error message if batch extraction fails

Examples

# Extract multiple PDFs from binary data
data_list = [pdf_binary1, pdf_binary2, pdf_binary3]
mime_types = ["application/pdf", "application/pdf", "application/pdf"]
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_bytes(data_list, mime_types)

# Use single MIME type for all inputs
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_bytes(data_list, "application/pdf")

# With config
config = %Kreuzberg.ExtractionConfig{ocr: %{"enabled" => true}}
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_bytes(data_list, mime_types, config)

batch_extract_bytes!(data_list, mime_types, config \\ nil)

@spec batch_extract_bytes!(
  [binary()],
  String.t() | [String.t()],
  Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil
) :: [Kreuzberg.ExtractionResult.t()]

Extract content from multiple binary inputs, raising on error.

Same as batch_extract_bytes/3 but raises a Kreuzberg.Error exception if extraction fails.

Examples

data_list = [pdf_binary1, pdf_binary2, pdf_binary3]
results = Kreuzberg.BatchAPI.batch_extract_bytes!(data_list, "application/pdf")

batch_extract_files(paths, config_or_mime, third_arg \\ nil)

@spec batch_extract_files(
  [String.t() | Path.t()],
  String.t() | nil,
  Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil
) :: {:ok, [Kreuzberg.ExtractionResult.t()]} | {:error, String.t()}

Extract content from multiple files in a batch operation.

Parameters

  • paths - List of file paths (strings or Path.t())
  • mime_type - MIME type for all files (optional, defaults to nil for auto-detection)
  • config - ExtractionConfig struct or map with extraction options (optional)

Returns

  • {:ok, results} - List of ExtractionResult structs
  • {:error, reason} - Error message if batch extraction fails

Examples

# Extract multiple PDFs
paths = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_files(paths, "application/pdf")

# Extract with config
config = %Kreuzberg.ExtractionConfig{images: %{"enabled" => true}}
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_files(paths, "application/pdf", config)

# Auto-detect MIME types
{:ok, results} = Kreuzberg.BatchAPI.batch_extract_files(paths)

batch_extract_files!(paths, config_or_mime, third_arg \\ nil)

@spec batch_extract_files!(
  [String.t() | Path.t()],
  String.t() | nil,
  Kreuzberg.ExtractionConfig.t() | map() | keyword() | nil
) :: [Kreuzberg.ExtractionResult.t()]

Extract content from multiple files, raising on error.

Same as batch_extract_files/3 but raises a Kreuzberg.Error exception if extraction fails.

Examples

paths = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = Kreuzberg.BatchAPI.batch_extract_files!(paths, "application/pdf")