View Source TextChunker (TextChunker v0.3.1)
Provides a high-level interface for text chunking, employing a configurable splitting strategy (defaults to recursive splitting). Manages options and coordinates the process, tracking chunk metadata.
Key Features
- Customizable Splitting: Allows the splitting strategy to be customized via the
:strategy
option. - Size and Overlap Control: Provides options for
:chunk_size
and:chunk_overlap
. - Metadata Tracking: Generates
Chunk
structs containing byte range information.
Supported Options
:chunk_size
(positive integer, default: 2000) - Maximum size in code point length for each chunk.:chunk_overlap
(non-negative integer, default: 200) - Number of overlapping code points between consecutive chunks to preserve context.:strategy
(module default:RecursiveChunk
) - A module implementing the split function. Currently onlyRecursiveChunk
is supported.:format
(atom, default::plaintext
) - The format of the input text. Used to determine where to split the text in some strategies.
Summary
Functions
Splits the provided text into a list of %Chunk{}
structs.
Functions
Splits the provided text into a list of %Chunk{}
structs.
Examples
iex> long_text = "This is a very long text that needs to be split into smaller pieces for easier handling."
iex> TextChunker.split(long_text)
# => [%Chunk{}, %Chunk{}, ...]
iex> TextChunker.split(long_text, chunk_size: 10, chunk_overlap: 3)