LeXtract.TextChunk (lextract v0.1.2)

View Source

Represents a chunk of text from a document, used for processing long documents.

Fields

  • :text - The chunk text content
  • :document - Reference to source document
  • :token_interval - Token range in original document
  • :char_interval - Character range in original document
  • :chunk_index - Position in sequence of chunks

Examples

iex> chunk = %LeXtract.TextChunk{
...>   text: "Sample chunk",
...>   chunk_index: 0
...> }
iex> chunk.chunk_index
0

Summary

Functions

Returns the character count of the chunk text.

Returns the byte size of the chunk text.

Types

t()

@type t() :: %LeXtract.TextChunk{
  char_interval: LeXtract.CharInterval.t() | nil,
  chunk_index: non_neg_integer() | nil,
  document: LeXtract.Document.t() | nil,
  text: String.t(),
  token_interval: LeXtract.TokenInterval.t() | nil
}

Functions

char_count(text_chunk)

@spec char_count(t()) :: non_neg_integer()

Returns the character count of the chunk text.

Examples

iex> chunk = %LeXtract.TextChunk{text: "Hello"}
iex> LeXtract.TextChunk.char_count(chunk)
5

text_byte_size(text_chunk)

@spec text_byte_size(t()) :: non_neg_integer()

Returns the byte size of the chunk text.

Examples

iex> chunk = %LeXtract.TextChunk{text: "Hello"}
iex> LeXtract.TextChunk.text_byte_size(chunk)
5