Semantic search

View Source

This guide will walk you through adding semantic search to your application using Torus and PostgreSQL.

Initial requirements

Your app needs to use PostgreSQL and Ecto, and you should be comfortable storing and comparing embedded vectors in PostgreSQL using pgvector extension.

Before you start, make sure you have pgvector extension installed:

CREATE EXTENSION IF NOT EXISTS vector;

Getting started

Semantic search is a technology that understands the meaning behind words to deliver results that match a user's intent, not just exact keywords. It uses AI and machine learning to interpret natural language in context, improving the relevance of search results.

An embedding is a numerical representation (vector) of a word, sentence, or document in a high‑dimensional space. It captures the semantic meaning of the text, allowing for more accurate comparisons and searches. The more dimensions an embedding has, the more information it can capture. However, higher dimensions also mean greater complexity and computational cost.

The search process is split into three phases:

  1. Generate and store the embeddings for the data you want to search.

    We'll use Torus.to_vectors/1 function. We'll dive deeper into how to do this efficiently later.

  2. Generate an embedding for the search term.

    This is also done with the Torus.to_vector/1 function.

  3. Compare the embedding of the search term with the embeddings of the data you want to search.

    We'll do this using Torus.semantic/5 function.

Overall, our search will look like this:

def search(term) do
  # Generate an embedding for the search term
  search_vector = Torus.to_vector(term)

  Post
  |> Torus.semantic([p], p.embedding, search_vector)
  |> Repo.all()
end

Note: You’ll need to join or preload the associated embeddings if they're stored in a separate table.

1. Generating embeddings

There are several ways to generate embeddings. Torus includes a set of built‑in Torus.Embeddings modules that implement the Torus.Embedding behaviour - but you're not limited to those. You can easily implement the Torus.Embedding behaviour yourself — it’s designed to be simple and straightforward.

Here is what Torus provides out of the box:

Torus.Embeddings.HuggingFace

Torus.Embeddings.HuggingFace is a wrapper around the Hugging Face API. It allows you to generate embeddings using a variety of models available on Hugging Face.

To use it:

  • Add the following to your config.exs:

    config :torus, embedding_module: Torus.Embeddings.HuggingFace
  • Add req to your mix.exs dependencies:

    def deps do
    [
      {:req, "~> 0.5"}
    ]
    end
  • Add an API token for Hugging Face to your runtime.exs. You can get your token here.

    # `config/runtime.exs`
    config :torus, Torus.Embeddings.HuggingFace, token: System.get_env("HUGGING_FACE_API_KEY")

By default, it uses the sentence-transformers/all-MiniLM-L6-v2 model, but you can specify a different model by explicitly passing model in the configuration or when calling Torus.to_vector/1 function via model option:

config :torus, Torus.Embeddings.HuggingFace, model: "your/model"

Torus.Embeddings.Gemini

Torus.Embeddings.Gemini is a wrapper around the Gemini API.

To use it:

  • Add the following to your config.exs:

    config :torus, embedding_module: Torus.Embeddings.Gemini
  • Add req to your mix.exs dependencies:

    def deps do
    [
      {:req, "~> 0.5"}
    ]
    end
  • Add an API token for Hugging Face to your runtime.exs. You can get your token here.

    # `config/runtime.exs`
    config :torus, Torus.Embeddings.Gemini, token: System.get_env("GEMINI_API_KEY")

By default, it uses the text-embedding-004 model, but you can specify a different model by explicitly passing model in the configuration or when calling Torus.to_vector/1 function via model option:

config :torus, Torus.Embeddings.Gemini, model: "your/model"

Torus.Embeddings.OpenAI

Torus.Embeddings.OpenAI is a wrapper around the OpenAI API. It allows you to generate embeddings using OpenAI models.

To use it:

  • Add the following to your config.exs:

    config :torus, embedding_module: Torus.Embeddings.OpenAI
  • Add req to your mix.exs dependencies:

    def deps do
    [
      {:req, "~> 0.5"}
    ]
    end
  • Add an API token for OpenAI to your runtime.exs. You can get your token here.

    config :torus, Torus.Embeddings.OpenAI, token: System.get_env("OPEN_AI_API_KEY")

By default, it uses the sentence-transformers/all-MiniLM-L6-v2 model, but you can specify a different model by explicitly passing model in the configuration or when calling Torus.to_vector/1 function via model option:

config :torus, Torus.Embeddings.OpenAI, model: "your/model"

Torus.Embeddings.PostgresML

Torus.Embeddings.PostgresML uses the PostgreSQL PostgresML extension to generate embeddings. It allows you to generate embeddings using a variety of models and performs inference directly in the database. This requires your database to have GPU support.

To use it, add the following to your config.exs:

config :torus, embedding_module: Torus.Embeddings.PostgresML
config :torus, Torus.Embeddings.PostgresML, repo: YourApp.Repo

By default, it uses the sentence-transformers/all-MiniLM-L6-v2 model, but you can specify a different model by explicitly passing model in the configuration or when calling Torus.to_vector/1 function via model option:

config :torus, Torus.Embeddings.PostgresML, model: "your/model"

Torus.Embeddings.LocalNxServing

Torus.Embeddings.LocalNxServing will probably require an instance with GPU support. It allows you to generate embeddings on your local machine using a variety of models available on Hugging Face. It leverages the nx and bumblebee libraries to perform inference.

To use it:

  • Add the following to your config.exs:

    config :torus, embedding_module: Torus.Embeddings.LocalNxServing
  • Add bumblebee and nx to your mix.exs dependencies:

    def deps do
    [
      {:bumblebee, "~> 0.6"},
      {:nx, "~> 0.9"}
    ]
    end
  • Add it to your supervision tree:

    Here you'd probably want to start it only on machines with GPUs. See more information in the Nx.Serving documentation.

    def start(_type, _args) do
      children = [
        # Your deps
        Torus.Embeddings.LocalNxServing
      ]
    
      opts = [strategy: :one_for_one, name: YourApp.Supervisor]
      Supervisor.start_link(children, opts)
    end

You can pass all options directly to the Nx.Serving.start_link/1 function by passing them to Torus.Embeddings.LocalNxServing when starting.

By default, it uses the sentence-transformers/all-MiniLM-L6-v2 model, but you can specify a different model by explicitly passing model in the configuration:

config :torus, Torus.Embeddings.LocalNxServing, model: "your/model"

Torus.Embeddings.Batcher

Torus.Embeddings.Batcher is a long‑running GenServer that collects individual embedding calls, groups them into a single batch, and forwards the batch to the configured embedding_module. It can be used in combination with any of the embedding modules above.

It is considered good practice to batch requests to the embedding module, especially when you are dealing with high‑traffic applications.

To use it:

  • Add the following to your config.exs:

    config :torus, embedding_module: Torus.Embeddings.Batcher
    
    config :torus, Torus.Embeddings.Batcher,
      max_batch_size: 10,
      default_batch_timeout: 100,
      embedding_module: Torus.Embeddings.HuggingFace
  • Add it to your supervision tree:

    def start(_type, _args) do
      children = [
        # Your deps
        Torus.Embeddings.Batcher
      ]
    
      opts = [strategy: :one_for_one, name: YourApp.Supervisor]
      Supervisor.start_link(children, opts)
    end
  • Configure your embedding_module of choice (see the corresponding section).

After that, you can call the Torus.to_vector/1 and Torus.to_vectors/1 functions.

Torus.Embeddings.NebulexCache

Torus.Embeddings.NebulexCache is a wrapper around the Nebulex cache. It allows you to cache embedding calls in memory, so you save resources and cost by avoiding repeated calls to the embedding module for the same input.

To use it:

  • Add the following to your config.exs:

    config :torus, embedding_module: Torus.Embeddings.NebulexCache
    config :torus, Torus.Embeddings.NebulexCache,
      embedding_module: Torus.Embeddings.PostgresML,
      cache: Nebulex.Cache,
      otp_name: :your_app,
      adapter: Nebulex.Adapters.Local,
      # Other adapter‑specific options
      allocated_memory: 1_000_000_000 # 1 GB
    
    # Embedding‑module‑specific options
    config :torus, Torus.Embeddings.PostgresML, repo: TorusExample.Repo
  • Add nebulex and decorator to your mix.exs dependencies:

    def deps do
    [
      {:nebulex, ">= 0.0.0"},
      {:decorator, ">= 0.0.0"}
    ]
    end
  • Add Torus.Embeddings.NebulexCache to your supervision tree:

    def start(_type, _args) do
      children = [
        Torus.Embeddings.NebulexCache
      ]
    
      opts = [strategy: :one_for_one, name: YourApp.Supervisor]
      Supervisor.start_link(children, opts)
    end

See the Nebulex documentation for more information on how to configure the cache.

And you're good to go. As you can see, you can create a chain of embedding modules to compose an embedding process of your choice. Each of them should implement the Torus.Embedding behaviour, and you're all set!

Your custom embedding implementing Torus.Embedding

To implement your own embedding module, you can start off with copying the existing one and tailoring it to your needs. See Torus.Embeddings.HuggingFace for more details.

Ideally, in the end, you'd have a chain like this by passing the NebulexCache as embedding_module to :torus and then stacking embedders on top of each other:

Chain of embeddings

# config.exs
config :torus, embedding_module: Torus.Embeddings.NebulexCache
config :torus, Torus.Embeddings.NebulexCache, embedding_module: Torus.Embeddings.Batcher
config :torus, Torus.Embeddings.Batcher, embedding_module: Torus.Embeddings.HuggingFace

2. Storing the embeddings

Database structure

There are multiple ways to store embeddings, but I'd recommend the following approach:

create table(:embeddings) do
  # Model name as a string so we can differentiate between models
  add :model, :string, null: false
  # Parameters used to generate the embedding (maybe an embedding version) so we can
  # filter by the newest later
  add :metadata, :jsonb, null: false, default: "{}"
  # Maybe more columns to differentiate this embedding from others?
  # Actual embedding vector (384 dimensions for the `all-MiniLM-L6-v2` model)
  add :embedding, :vector, size: 384, null: false
end
create table(:posts) do
  add :title, :string
  add :body, :string

  timestamps(type: :utc_datetime)
end
create table(:post_embeddings) do
  add :post_id, references(:posts, on_delete: :delete_all), null: false
  add :embedding_id, references(:embeddings, on_delete: :delete_all), null: false
end

This gives us a many‑to‑many relationship between posts and embeddings, so we can version the embeddings and even store multiple embeddings for the same post. However, I'd suggest concatenating all string/binary fields together to generate one embedding per post for simpler searches later on.

There is also a pattern to store a vector without the dimensions and its dimensions and metadata in separate columns, but I'd not recommend it since we won't be able to index this vector later on.

The embedding process – existing rows

I suggest inserting an Oban job to generate embeddings in chunks for all rows in the database, using the Torus.to_vectors/1 function.

The embedding process – new rows

There are a few ways to handle the embedding process for new rows:

  1. Add a cron Oban job to run periodically and, in batches, embed all rows that need it.
  2. Schedule an Oban job to embed the row after inserting it into the database.
  3. (Least recommended) Embed the row in the same transaction used to insert it into the database.

3. Searching

We need to generate the embedding and then compare it with the embeddings in the database. This can be done using the Torus.semantic/5 function.

def search(term) do
  search_vector = Torus.to_vector(term)

  Post
  |> Torus.semantic([p], p.embedding, search_vector, distance: :l2_distance, pre_filter: 0.7)
  |> Repo.all()
end