Arcana.Chunker behaviour (Arcana v1.3.3)

View Source

Behaviour for text chunking providers used by Arcana.

Arcana accepts any module that implements this behaviour. Built-in implementations are provided for:

Configuration

Configure your chunking provider in config.exs:

# Default: text_chunker-based chunking
config :arcana, chunker: :default

# Default chunker with custom options
config :arcana, chunker: {:default, chunk_size: 512, chunk_overlap: 100}

# Custom function
config :arcana, chunker: fn text, opts -> [%{text: text, chunk_index: 0, token_count: 10}] end

# Custom module implementing this behaviour
config :arcana, chunker: MyApp.SemanticChunker
config :arcana, chunker: {MyApp.SemanticChunker, model: "..."}

Implementing a Custom Chunker

Create a module that implements this behaviour:

defmodule MyApp.SemanticChunker do
  @behaviour Arcana.Chunker

  @impl true
  def chunk(text, opts) do
    # Custom chunking logic...
    # Return list of chunk maps
    [
      %{text: "chunk 1", chunk_index: 0, token_count: 50},
      %{text: "chunk 2", chunk_index: 1, token_count: 45}
    ]
  end
end

Then configure:

config :arcana, chunker: {MyApp.SemanticChunker, model: "..."}

Chunk Format

Each chunk returned must be a map with at minimum:

  • :text - The chunk text content (required)
  • :chunk_index - Zero-based index of this chunk (required)
  • :token_count - Estimated token count (required)

Additional keys may be included and will be passed through to storage.

Summary

Callbacks

Splits text into chunks.

Functions

Chunks text using the configured chunker.

Chunks text using the configured chunker, merging additional options.

Callbacks

chunk(text, opts)

@callback chunk(text :: String.t(), opts :: keyword()) :: [map()]

Splits text into chunks.

Returns a list of chunk maps, each containing at minimum :text, :chunk_index, and :token_count.

Options

Options are implementation-specific. Common options include:

  • :chunk_size - Maximum chunk size
  • :chunk_overlap - Overlap between chunks
  • :format - Text format hint (:plaintext, :markdown, etc.)

Functions

chunk(arg, text)

Chunks text using the configured chunker.

The chunker is a {module, opts} tuple where module implements this behaviour.

chunk(arg, text, extra_opts)

Chunks text using the configured chunker, merging additional options.

Useful when you need to override chunker defaults at call time.