Kreuzberg.UtilityAPI (kreuzberg v4.4.2)

Copy Markdown View Source

Utility functions for Kreuzberg extraction operations.

This module provides helper functions for MIME type detection and validation, extension mapping, embedding preset management, and error classification. These utilities are essential for pre-extraction validation and post-extraction analysis.

MIME Type Operations

Embedding Presets

Error Handling

Examples

# MIME type detection
{:ok, mime_type} = Kreuzberg.UtilityAPI.detect_mime_type(pdf_binary)
{:ok, mime_type} = Kreuzberg.UtilityAPI.detect_mime_type_from_path("document.pdf")

# MIME type validation
{:ok, _} = Kreuzberg.UtilityAPI.validate_mime_type("application/pdf")
{:error, _} = Kreuzberg.UtilityAPI.validate_mime_type("invalid/type")

# Extension mapping
{:ok, extensions} = Kreuzberg.UtilityAPI.get_extensions_for_mime("application/pdf")

# Embedding presets
{:ok, presets} = Kreuzberg.UtilityAPI.list_embedding_presets()
{:ok, preset} = Kreuzberg.UtilityAPI.get_embedding_preset("balanced")

# Error classification
atom = Kreuzberg.UtilityAPI.classify_error("File not found")

Summary

Functions

Classify an error message into a semantic error category.

Detect the MIME type of binary data using content inspection.

Detect the MIME type of a file using its path and extension.

Get detailed information about a specific embedding preset.

Get information about all error categories.

Get all file extensions associated with a given MIME type.

List all available embedding model presets.

Validate that a MIME type string is supported by Kreuzberg.

Functions

classify_error(error_message)

@spec classify_error(String.t()) :: atom()

Classify an error message into a semantic error category.

Analyzes error messages using pattern matching and heuristics to categorize them into predefined error types, useful for error handling and user feedback.

Parameters

  • error_message - Error message string to classify

Returns

  • error_atom - Atom representing the error category

Error Categories

  • :io_error - File I/O related errors (file not found, permission denied, etc.)
  • :invalid_format - File format errors (corrupted files, unsupported formats, etc.)
  • :invalid_config - Configuration or parameter errors
  • :ocr_error - OCR engine or processing errors
  • :extraction_error - General extraction failures
  • :unknown_error - Errors that don't match other categories

Examples

iex> Kreuzberg.UtilityAPI.classify_error("File not found: /path/to/file.pdf")
:io_error

iex> Kreuzberg.UtilityAPI.classify_error("Invalid PDF format")
:invalid_format

iex> Kreuzberg.UtilityAPI.classify_error("OCR engine failed")
:ocr_error

iex> Kreuzberg.UtilityAPI.classify_error("Unknown error occurred")
:unknown_error

detect_mime_type(data)

@spec detect_mime_type(binary()) :: {:ok, String.t()} | {:error, String.t()}

Detect the MIME type of binary data using content inspection.

Analyzes the binary content to determine the file format, supporting a wide range of document and image formats. This is more reliable than extension-based detection for files that may have incorrect extensions.

Parameters

  • data - Binary data to analyze (any document or image format)

Returns

  • {:ok, mime_type} - Detected MIME type as a string (e.g., "application/pdf")
  • {:error, reason} - Error if detection fails

Examples

iex> pdf_binary = File.read!("document.pdf")
iex> {:ok, mime} = Kreuzberg.UtilityAPI.detect_mime_type(pdf_binary)
iex> mime
"application/pdf"

iex> image_binary = File.read!("photo.jpg")
iex> {:ok, mime} = Kreuzberg.UtilityAPI.detect_mime_type(image_binary)
iex> mime
"image/jpeg"

detect_mime_type_from_path(path)

@spec detect_mime_type_from_path(String.t() | Path.t()) ::
  {:ok, String.t()} | {:error, String.t()}

Detect the MIME type of a file using its path and extension.

Uses file extension and optional content inspection to determine the file format. Faster than binary content analysis but may be less reliable for files with incorrect extensions.

Parameters

  • path - File path as a string or Path.t()

Returns

  • {:ok, mime_type} - Detected MIME type as a string
  • {:error, reason} - Error if detection fails

Examples

iex> {:ok, mime} = Kreuzberg.UtilityAPI.detect_mime_type_from_path("document.pdf")
iex> mime
"application/pdf"

iex> {:ok, mime} = Kreuzberg.UtilityAPI.detect_mime_type_from_path("spreadsheet.xlsx")
iex> mime
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

get_embedding_preset(preset_name)

@spec get_embedding_preset(String.t()) :: {:ok, map()} | {:error, String.t()}

Get detailed information about a specific embedding preset.

Retrieves comprehensive details about a named embedding preset, including model information, chunk configuration, and dimensionality.

Parameters

  • preset_name - Name of the embedding preset (e.g., "fast", "balanced", "quality", "multilingual")

Returns

  • {:ok, preset_info} - Map containing preset details with keys:
    • "name" - Preset name
    • "chunk_size" - Chunk size in tokens for processing
    • "overlap" - Chunk overlap in tokens
    • "dimensions" - Embedding vector dimension
    • "description" - Human-readable description
  • {:error, reason} - Error if preset not found

Examples

iex> {:ok, preset} = Kreuzberg.UtilityAPI.get_embedding_preset("fast")
iex> preset["name"]
"fast"
iex> preset["dimensions"]
384

iex> {:ok, preset} = Kreuzberg.UtilityAPI.get_embedding_preset("quality")
iex> is_map(preset)
true
iex> preset["chunk_size"]
512

iex> {:error, _} = Kreuzberg.UtilityAPI.get_embedding_preset("nonexistent")

get_error_details()

@spec get_error_details() :: {:ok, map()} | {:error, String.t()}

Get information about all error categories.

Returns a structured map describing all error classification categories that can be returned by the error classification system.

Returns

  • {:ok, error_details} - Map where keys are error category atoms and values are descriptions and example patterns for each category

Examples

iex> {:ok, details} = Kreuzberg.UtilityAPI.get_error_details()
iex> is_map(details)
true
iex> Map.has_key?(details, :io_error)
true
iex> details[:io_error]["examples"]
["File not found", "Permission denied", "No such file or directory"]

get_extensions_for_mime(mime_type)

@spec get_extensions_for_mime(String.t()) ::
  {:ok, [String.t()]} | {:error, String.t()}

Get all file extensions associated with a given MIME type.

Maps a MIME type to its commonly used file extensions, which can be useful for file naming, validation, or user interface purposes.

Parameters

  • mime_type - MIME type string (e.g., "application/pdf")

Returns

  • {:ok, extensions} - List of file extensions (without dot, e.g., ["pdf"])
  • {:error, reason} - Error if MIME type is not found

Examples

iex> {:ok, exts} = Kreuzberg.UtilityAPI.get_extensions_for_mime("application/pdf")
iex> exts
["pdf"]

iex> {:ok, exts} = Kreuzberg.UtilityAPI.get_extensions_for_mime("image/jpeg")
iex> exts
["jpg", "jpeg"]

iex> {:ok, exts} = Kreuzberg.UtilityAPI.get_extensions_for_mime("text/plain")
iex> exts
["txt"]

list_embedding_presets()

@spec list_embedding_presets() :: {:ok, [String.t()]} | {:error, String.t()}

List all available embedding model presets.

Returns the names of all embedding presets configured in Kreuzberg, which can be used with the embedding configuration options during extraction.

Returns

  • {:ok, presets} - List of preset names as strings
  • {:error, reason} - Error if retrieval fails

Examples

iex> {:ok, presets} = Kreuzberg.UtilityAPI.list_embedding_presets()
iex> presets
["balanced", "fast", "quality", "multilingual"]

iex> Enum.member?(presets, "balanced")
true

validate_mime_type(mime_type)

@spec validate_mime_type(String.t()) :: {:ok, String.t()} | {:error, String.t()}

Validate that a MIME type string is supported by Kreuzberg.

Checks if the provided MIME type is in the list of supported formats that can be processed by Kreuzberg extractors.

Parameters

  • mime_type - MIME type string to validate (e.g., "application/pdf")

Returns

  • {:ok, mime_type} - Returns the MIME type if valid
  • {:error, reason} - Error if MIME type is not supported

Examples

iex> {:ok, _} = Kreuzberg.UtilityAPI.validate_mime_type("application/pdf")

iex> {:error, _} = Kreuzberg.UtilityAPI.validate_mime_type("application/invalid")

iex> {:ok, _} = Kreuzberg.UtilityAPI.validate_mime_type("image/jpeg")