Rag.VectorStore (rag v0.3.4)
View SourceVector store operations for semantic search with pgvector.
This module provides functions for:
- Building and managing text chunks
- Semantic (vector) search using L2 distance
- Full-text search using PostgreSQL tsvector
- Hybrid search combining both approaches with RRF
Usage
# Build chunks from text
chunks = VectorStore.build_chunks([
%{content: "First paragraph", source: "doc.md"},
%{content: "Second paragraph", source: "doc.md"}
])
# Add embeddings (after generating with Router/Gemini)
chunks = VectorStore.add_embeddings(chunks, embeddings)
# Build search queries
query = VectorStore.semantic_search_query(query_embedding, limit: 10)Search Types
- Semantic search: Uses pgvector L2 distance for similarity
- Full-text search: Uses PostgreSQL tsvector for keyword matching
- Hybrid search: Combines both using Reciprocal Rank Fusion (RRF)
Summary
Functions
Add embeddings to a list of chunks.
Build a single chunk struct from attributes.
Build multiple chunks from a list of attributes.
Calculate RRF (Reciprocal Rank Fusion) score to combine search results.
Split text into chunks with optional overlap.
Convert Chunker.Chunk structs to VectorStore format.
Build an Ecto query for full-text search using PostgreSQL tsvector.
Prepare a chunk for database insertion.
Build an Ecto query for semantic search using L2 distance.
Functions
@spec add_embeddings([Rag.VectorStore.Chunk.t()], [[float()]]) :: [ Rag.VectorStore.Chunk.t() ]
Add embeddings to a list of chunks.
Raises ArgumentError if the number of chunks doesn't match
the number of embeddings.
Examples
iex> chunks = [%Chunk{content: "a"}, %Chunk{content: "b"}]
iex> embeddings = [[0.1, 0.2], [0.3, 0.4]]
iex> VectorStore.add_embeddings(chunks, embeddings)
[%Chunk{content: "a", embedding: [0.1, 0.2]}, ...]
@spec build_chunk(map()) :: Rag.VectorStore.Chunk.t()
Build a single chunk struct from attributes.
Parameters
attrs- Map with:content(required),:source,:embedding,:metadata
Examples
iex> VectorStore.build_chunk(%{content: "Hello", source: "test.ex"})
%Chunk{content: "Hello", source: "test.ex", metadata: %{}}
@spec build_chunks([map()]) :: [Rag.VectorStore.Chunk.t()]
Build multiple chunks from a list of attributes.
Examples
iex> VectorStore.build_chunks([%{content: "a"}, %{content: "b"}])
[%Chunk{content: "a"}, %Chunk{content: "b"}]
Calculate RRF (Reciprocal Rank Fusion) score to combine search results.
Combines semantic search and full-text search results using RRF, which is effective for hybrid search.
Formula
RRF(d) = Σ 1 / (k + rank(d))
where k is typically 60.
Examples
iex> semantic = [%{id: 1, distance: 0.1}, %{id: 2, distance: 0.2}]
iex> fulltext = [%{id: 2, rank: 0.8}, %{id: 3, rank: 0.6}]
iex> VectorStore.calculate_rrf_score(semantic, fulltext)
[%{id: 2, rrf_score: ...}, %{id: 1, rrf_score: ...}, ...]
Split text into chunks with optional overlap.
Uses character-based chunking with sentence boundary awareness when possible.
Options
:max_chars- Maximum characters per chunk (default: 500):overlap- Characters to overlap between chunks (default: 50)
Examples
iex> VectorStore.chunk_text("Long text...", max_chars: 200)
["First chunk...", "Second chunk..."]
@spec from_chunker_chunks([Rag.Chunker.Chunk.t()], String.t()) :: [ Rag.VectorStore.Chunk.t() ]
Convert Chunker.Chunk structs to VectorStore format.
Preserves byte positions in metadata for source highlighting.
@spec fulltext_search_query( String.t(), keyword() ) :: Ecto.Query.t()
Build an Ecto query for full-text search using PostgreSQL tsvector.
Options
:limit- Maximum number of results (default: 10)
Examples
iex> VectorStore.fulltext_search_query("search terms", limit: 10)
#Ecto.Query<...>
@spec prepare_for_insert(Rag.VectorStore.Chunk.t()) :: map()
Prepare a chunk for database insertion.
Converts chunk to a map suitable for Ecto insert_all, including timestamps for Ecto schemas with timestamps().
Examples
iex> prepared = VectorStore.prepare_for_insert(%Chunk{content: "Test"})
iex> Map.keys(prepared) |> Enum.sort()
[:content, :embedding, :inserted_at, :metadata, :source, :updated_at]
@spec semantic_search_query( [float()], keyword() ) :: Ecto.Query.t()
Build an Ecto query for semantic search using L2 distance.
Returns results ordered by distance (closest first).
Options
:limit- Maximum number of results (default: 10):min_similarity- Minimum similarity threshold (optional)
Examples
iex> VectorStore.semantic_search_query([0.1, 0.2, ...], limit: 5)
#Ecto.Query<...>