TextChunker (TextChunker v0.5.2)
View SourceProvides a high-level interface for text chunking, employing a configurable splitting strategy (defaults to recursive splitting). Manages options and coordinates the process, tracking chunk metadata.
Key Features
- Customizable Splitting: Allows the splitting strategy to be customized via the
:strategyoption. - Size and Overlap Control: Provides options for
:chunk_sizeand:chunk_overlap. - Metadata Tracking: Generates
Chunkstructs containing byte range information.
Supported Options
:chunk_size(positive integer, default: 2000) - Maximum size in token length for each chunk.:get_chunk_size(function, default: &String.length/1) - A function that returns the number of tokens in a chunk, by default the number of code points.:chunk_overlap(non-negative integer, default: 200) - Number of overlapping tokens between consecutive chunks to preserve context.:strategy(module default:RecursiveChunk) - A module implementing the split function. Currently onlyRecursiveChunkis supported.:format(atom, default::plaintext) - The format of the input text. Used to determine where to split the text in some strategies.
Summary
Functions
Splits the provided text into a list of %Chunk{} structs.
Functions
Splits the provided text into a list of %Chunk{} structs.
Examples
iex> long_text = "This is a very long text that needs to be split into smaller pieces for easier handling."
iex> TextChunker.split(long_text)
# => [%Chunk{}, %Chunk{}, ...]
iex> TextChunker.split(long_text, chunk_size: 10, chunk_overlap: 3)
# => Generates many smaller chunks with significant overlap