Kreuzberg.ExtractionResult (kreuzberg v4.0.4)
View SourceStructure representing the result of a document extraction operation.
Contains all extracted data from a processed document, including content, metadata, tables, detected languages, chunks with embeddings, images with OCR results, and per-page information.
Fields
:content- The main extracted text content as a UTF-8 string- Contains the primary textual output from document analysis
- Cleaned and normalized from the original document
- May include line breaks and structural markers
:mime_type- The MIME type of the processed document (e.g., "application/pdf")- Used to identify document type and format
- Common types: "application/pdf", "text/plain", "image/png", etc.
- Helps downstream processors know how to handle the content
:metadata- Metadata struct containing document-specific information- Proper Kreuzberg.Metadata struct with typed fields
- Contains title, author, created_date, page_count, etc.
- Can be an empty struct if no metadata is available
:tables- List of extracted table structs- Each table is a Kreuzberg.Table struct with proper fields
- Contains cells, headers, markdown, and other table info
- Empty list [] if no tables found in document
:detected_languages- List of detected language codes (ISO 639-1 format)- Language codes: "en", "de", "fr", "es", "zh", etc.
- May be nil if language detection is disabled
- Multiple languages if document contains mixed-language content
- Example: ["en", "de"] for bilingual document
:chunks- Optional list of text chunk structs with embeddings- nil if chunking/embedding is not enabled
- Each chunk is a Kreuzberg.Chunk struct with text and embedding
- Used for semantic search and RAG applications
:images- Optional list of extracted image structs with OCR results- nil if image extraction is disabled
- Each image is a Kreuzberg.Image struct with format, data, and ocr_text
- OCR text is result of Tesseract or other OCR backend processing
:pages- Optional list of per-page content structs- nil if page-level extraction is not enabled
- Each page is a Kreuzberg.Page struct with number, content, and dimensions
- Useful for documents where position and structure matter
:keywords- Optional list of extracted keyword maps- nil if keyword extraction is disabled
- Each keyword is a map with "text" and "score" fields
- Used for document classification, tagging, and search optimization
Examples
# Basic extraction result
iex> result = %Kreuzberg.ExtractionResult{
...> content: "Document content",
...> mime_type: "application/pdf",
...> metadata: %Kreuzberg.Metadata{},
...> tables: [],
...> detected_languages: ["en"]
...> }
iex> result.content
"Document content"
# Rich extraction with metadata and tables
iex> result = %Kreuzberg.ExtractionResult{
...> content: "Sales Report 2024\n\nQ1: 1M, Q2: 1.2M, Q3: 1.5M",
...> mime_type: "application/pdf",
...> metadata: %Kreuzberg.Metadata{title: "Sales Report"},
...> tables: [%Kreuzberg.Table{headers: ["Quarter", "Amount"]}],
...> detected_languages: ["en"],
...> chunks: nil,
...> images: nil,
...> pages: nil
...> }
iex> result.metadata.title
"Sales Report"
# Full extraction with all fields
iex> result = %Kreuzberg.ExtractionResult{
...> content: "Multi-page document content...",
...> mime_type: "application/pdf",
...> metadata: %Kreuzberg.Metadata{page_count: 5},
...> tables: [%Kreuzberg.Table{cells: [["Data1", "Data2"]]}],
...> detected_languages: ["en", "de"],
...> chunks: [%Kreuzberg.Chunk{text: "chunk1 content"}],
...> images: [%Kreuzberg.Image{format: "png", ocr_text: "Image text"}],
...> pages: [%Kreuzberg.Page{number: 1, content: "Page 1 content"}]
...> }
iex> Enum.count(result.pages)
1
Summary
Functions
Creates a new ExtractionResult from extracted data.
Types
@type t() :: %Kreuzberg.ExtractionResult{ chunks: [Kreuzberg.Chunk.t()] | nil, content: String.t(), detected_languages: [String.t()] | nil, images: [Kreuzberg.Image.t()] | nil, keywords: [map()] | nil, metadata: Kreuzberg.Metadata.t(), mime_type: String.t(), pages: [Kreuzberg.Page.t()] | nil, tables: [Kreuzberg.Table.t()] }
Functions
@spec new( String.t(), String.t(), Kreuzberg.Metadata.t() | map(), [Kreuzberg.Table.t() | map()], keyword() ) :: t()
Creates a new ExtractionResult from extracted data.
Parameters
content- The extracted text contentmime_type- The MIME type of the documentmetadata- Document metadata struct or map (defaults to empty Metadata struct)tables- List of extracted table structs or maps (defaults to empty list)opts- Optional keyword list containing::detected_languages- List of detected language codes:chunks- List of chunk structs or maps:images- List of image structs or maps:pages- List of page structs or maps:keywords- List of keyword structs or maps
Returns
An ExtractionResult struct with all fields properly typed as structs.
Examples
iex> Kreuzberg.ExtractionResult.new("text", "text/plain")
%Kreuzberg.ExtractionResult{
content: "text",
mime_type: "text/plain",
metadata: %Kreuzberg.Metadata{},
tables: [],
detected_languages: nil,
chunks: nil,
images: nil,
pages: nil
}
iex> metadata = %Kreuzberg.Metadata{page_count: 5}
iex> Kreuzberg.ExtractionResult.new("text", "application/pdf", metadata, [],
...> detected_languages: ["en", "de"])
%Kreuzberg.ExtractionResult{
content: "text",
mime_type: "application/pdf",
metadata: %Kreuzberg.Metadata{page_count: 5},
tables: [],
detected_languages: ["en", "de"],
chunks: nil,
images: nil,
pages: nil
}