Rag.Chunker.Semantic (rag v0.3.4)

View Source

Semantic chunking using embedding similarity.

Groups sentences based on embedding similarity. Starts a new chunk when similarity drops below threshold or max_chars is reached.

Options

  • embedding_fn - Function (String.t() -> [float()]) to generate embeddings (required)
  • threshold - Similarity threshold for grouping (default: 0.8)
  • max_chars - Maximum characters per chunk (default: 500)

Summary

Functions

Split text into semantic chunks using embedding similarity.

Returns default options for the semantic chunker.

Types

embedding()

@type embedding() :: [float()]

t()

@type t() :: %Rag.Chunker.Semantic{
  embedding_fn: (String.t() -> embedding()) | nil,
  max_chars: pos_integer(),
  threshold: float()
}

Functions

chunk(chunker, text, opts)

@spec chunk(t(), String.t(), keyword()) :: [Rag.Chunker.Chunk.t()]

Split text into semantic chunks using embedding similarity.

default_opts()

@spec default_opts() :: keyword()

Returns default options for the semantic chunker.