Arcana.Chunker.Default (Arcana v1.3.3)

View Source

Default text chunker using the text_chunker library.

Supports multiple formats (plaintext, markdown, etc.) and can size chunks by characters or tokens.

Options

  • :chunk_size - Maximum chunk size (default: 450)
  • :chunk_overlap - Overlap between chunks (default: 50)
  • :format - Text format: :plaintext, :markdown, :elixir, etc. (default: :plaintext)
  • :size_unit - How to measure size: :characters or :tokens (default: :tokens)

Examples

Arcana.Chunker.Default.chunk("Hello world", chunk_size: 100)
Arcana.Chunker.Default.chunk(markdown_text, format: :markdown, chunk_size: 512)
Arcana.Chunker.Default.chunk(text, size_unit: :tokens, chunk_size: 256)