API Reference kreuzberg v#4.4.5

Copy Markdown View Source

Modules

High-performance document extraction for Elixir.

OTP Application callback for Kreuzberg.

Asynchronous extraction operations using Elixir Tasks.

Batch extraction operations for processing multiple documents efficiently.

Bounding box coordinates for element positioning in documents.

Cache management operations for the Kreuzberg extraction library.

Structure representing a text chunk with embedding for semantic search.

Metadata for a text chunk, tracking byte positions, indices, and page range.

Element attributes in Djot ({.class #id key="value"} syntax).

Comprehensive Djot document structure with semantic preservation.

Footnote in a Djot document.

Block-level element in a Djot document (paragraph, heading, list, etc.).

Image element in a Djot document.

Inline element within a Djot block (text, emphasis, link, etc.).

Link element in a Djot document.

A single node in the document tree.

Structured document representation with hierarchical node tree.

Inline text annotation with byte-range formatting and links.

Semantic element extracted from a document.

Metadata for a semantic element extracted from a document.

Exception module for Kreuzberg extraction errors.

Error metadata when extraction partially failed.

Configuration structure for document extraction operations.

Structure representing the result of a document extraction operation.

Shared helper functions for Kreuzberg extraction modules.

A hierarchical block within a page, representing heading-level structure.

Structure representing an extracted image from a document.

Metadata about image preprocessing applied before OCR.

Structure representing an extracted keyword with score and algorithm info.

Legacy API functions using deprecated patterns.

Structure representing document metadata extracted from files.

Bounding geometry for OCR-extracted text elements.

Confidence scores for OCR text detection and recognition.

OCR-extracted text element with detailed positioning and confidence information.

Rotation information for OCR-detected text.

Structure representing a single page extracted from a multi-page document.

Byte offset boundary for a page.

Hierarchy information for a page, containing heading-level blocks.

Metadata for an individual page/slide/sheet.

Page structure information for a document.

Structure representing a PDF annotation extracted from a document page.

Public Plugin API facade for registering and managing Kreuzberg plugins.

Behaviour module for OCR backends in the Kreuzberg plugin system.

Behaviour module for post-processor plugins in the Kreuzberg plugin system.

GenServer for managing Kreuzberg plugins.

OTP Supervisor for the Kreuzberg plugin system.

Behaviour module for Kreuzberg document extraction validators.

Structure representing a warning generated during document processing.

Structure representing an extracted table from a document.

Utility functions for Kreuzberg extraction operations.

Configuration validators for Kreuzberg extraction options.