Arcana.Chunker.Default (Arcana v1.3.3)
View SourceDefault text chunker using the text_chunker library.
Supports multiple formats (plaintext, markdown, etc.) and can size chunks by characters or tokens.
Options
:chunk_size- Maximum chunk size (default: 450):chunk_overlap- Overlap between chunks (default: 50):format- Text format::plaintext,:markdown,:elixir, etc. (default: :plaintext):size_unit- How to measure size::charactersor:tokens(default: :tokens)
Examples
Arcana.Chunker.Default.chunk("Hello world", chunk_size: 100)
Arcana.Chunker.Default.chunk(markdown_text, format: :markdown, chunk_size: 512)
Arcana.Chunker.Default.chunk(text, size_unit: :tokens, chunk_size: 256)