Rag.VectorStore (rag v0.3.4)

View Source

Vector store operations for semantic search with pgvector.

This module provides functions for:

  • Building and managing text chunks
  • Semantic (vector) search using L2 distance
  • Full-text search using PostgreSQL tsvector
  • Hybrid search combining both approaches with RRF

Usage

# Build chunks from text
chunks = VectorStore.build_chunks([
  %{content: "First paragraph", source: "doc.md"},
  %{content: "Second paragraph", source: "doc.md"}
])

# Add embeddings (after generating with Router/Gemini)
chunks = VectorStore.add_embeddings(chunks, embeddings)

# Build search queries
query = VectorStore.semantic_search_query(query_embedding, limit: 10)

Search Types

  • Semantic search: Uses pgvector L2 distance for similarity
  • Full-text search: Uses PostgreSQL tsvector for keyword matching
  • Hybrid search: Combines both using Reciprocal Rank Fusion (RRF)

Summary

Functions

Add embeddings to a list of chunks.

Build a single chunk struct from attributes.

Build multiple chunks from a list of attributes.

Calculate RRF (Reciprocal Rank Fusion) score to combine search results.

Split text into chunks with optional overlap.

Convert Chunker.Chunk structs to VectorStore format.

Build an Ecto query for full-text search using PostgreSQL tsvector.

Prepare a chunk for database insertion.

Build an Ecto query for semantic search using L2 distance.

Functions

add_embeddings(chunks, embeddings)

@spec add_embeddings([Rag.VectorStore.Chunk.t()], [[float()]]) :: [
  Rag.VectorStore.Chunk.t()
]

Add embeddings to a list of chunks.

Raises ArgumentError if the number of chunks doesn't match the number of embeddings.

Examples

iex> chunks = [%Chunk{content: "a"}, %Chunk{content: "b"}]
iex> embeddings = [[0.1, 0.2], [0.3, 0.4]]
iex> VectorStore.add_embeddings(chunks, embeddings)
[%Chunk{content: "a", embedding: [0.1, 0.2]}, ...]

build_chunk(attrs)

@spec build_chunk(map()) :: Rag.VectorStore.Chunk.t()

Build a single chunk struct from attributes.

Parameters

  • attrs - Map with :content (required), :source, :embedding, :metadata

Examples

iex> VectorStore.build_chunk(%{content: "Hello", source: "test.ex"})
%Chunk{content: "Hello", source: "test.ex", metadata: %{}}

build_chunks(attrs_list)

@spec build_chunks([map()]) :: [Rag.VectorStore.Chunk.t()]

Build multiple chunks from a list of attributes.

Examples

iex> VectorStore.build_chunks([%{content: "a"}, %{content: "b"}])
[%Chunk{content: "a"}, %Chunk{content: "b"}]

calculate_rrf_score(semantic_results, fulltext_results)

@spec calculate_rrf_score([map()], [map()]) :: [map()]

Calculate RRF (Reciprocal Rank Fusion) score to combine search results.

Combines semantic search and full-text search results using RRF, which is effective for hybrid search.

Formula

RRF(d) = Σ 1 / (k + rank(d))

where k is typically 60.

Examples

iex> semantic = [%{id: 1, distance: 0.1}, %{id: 2, distance: 0.2}]
iex> fulltext = [%{id: 2, rank: 0.8}, %{id: 3, rank: 0.6}]
iex> VectorStore.calculate_rrf_score(semantic, fulltext)
[%{id: 2, rrf_score: ...}, %{id: 1, rrf_score: ...}, ...]

chunk_text(text, opts \\ [])

@spec chunk_text(
  String.t(),
  keyword()
) :: [String.t()]

Split text into chunks with optional overlap.

Uses character-based chunking with sentence boundary awareness when possible.

Options

  • :max_chars - Maximum characters per chunk (default: 500)
  • :overlap - Characters to overlap between chunks (default: 50)

Examples

iex> VectorStore.chunk_text("Long text...", max_chars: 200)
["First chunk...", "Second chunk..."]

from_chunker_chunks(chunks, source)

@spec from_chunker_chunks([Rag.Chunker.Chunk.t()], String.t()) :: [
  Rag.VectorStore.Chunk.t()
]

Convert Chunker.Chunk structs to VectorStore format.

Preserves byte positions in metadata for source highlighting.

fulltext_search_query(search_text, opts \\ [])

@spec fulltext_search_query(
  String.t(),
  keyword()
) :: Ecto.Query.t()

Build an Ecto query for full-text search using PostgreSQL tsvector.

Options

  • :limit - Maximum number of results (default: 10)

Examples

iex> VectorStore.fulltext_search_query("search terms", limit: 10)
#Ecto.Query<...>

prepare_for_insert(chunk)

@spec prepare_for_insert(Rag.VectorStore.Chunk.t()) :: map()

Prepare a chunk for database insertion.

Converts chunk to a map suitable for Ecto insert_all, including timestamps for Ecto schemas with timestamps().

Examples

iex> prepared = VectorStore.prepare_for_insert(%Chunk{content: "Test"})
iex> Map.keys(prepared) |> Enum.sort()
[:content, :embedding, :inserted_at, :metadata, :source, :updated_at]

semantic_search_query(embedding, opts \\ [])

@spec semantic_search_query(
  [float()],
  keyword()
) :: Ecto.Query.t()

Build an Ecto query for semantic search using L2 distance.

Returns results ordered by distance (closest first).

Options

  • :limit - Maximum number of results (default: 10)
  • :min_similarity - Minimum similarity threshold (optional)

Examples

iex> VectorStore.semantic_search_query([0.1, 0.2, ...], limit: 5)
#Ecto.Query<...>