View Source Getting Started

Mix.install([
  {:rag, "~> 0.2.0"},
  {:kino, "~> 0.15.3"}
])

Basic Idea

The underlying idea of Retrieval Augmented Generation is that we provide an LLM with helpful context when we let it generate a response.

So, instead of starting the text generation from a plain question:

What was the weather like in Berlin on 2025-03-13?

We give it context and a question:

Context: Weather in Berlin on 2025-03-13: Cloudy and around 8°C

Question: What was the weather like in Berlin on 2025-03-13?

This way it's much easier for the LLM to respond with correct information (assuming the information in the context is correct).

The remainder of this guide represents a very basic example how we can implement this idea. For a more sophisticated example, you can run the installer with mix rag.install in an existing mix project and inspect the generated code.

Ingestion

To be able to provide helpful context at the right time, we first store information somewhere in a way that we can easily find relevant pieces. We call this process "ingestion".

Usually you will store the information in a suitable database. You might want to calculate embeddings of the information to perform semantic search. In that case, you can use Rag.Embedding.generate_embedding/3 or Rag.Embedding.generate_embeddings_batch/3.

Generally speaking, rag leaves the ingestion process up to you. For this guide we will use a simple map to store some weather data.

weather_data = %{
  "berlin" => %{
    ~D[2025-03-13] => "Cloudy and around 8°C"
  }
}
%{"berlin" => %{~D[2025-03-13] => "Cloudy and around 8°C"}}

Retrieval

We want to work with user queries like:

What was the weather like in Berlin on 2025-03-13?

Our next step is to find relevant information. Oftentimes, we would calculate an embedding of the query, and then use that to perform a semantic search for relevant data. We could also perform a text based search. The code generated by mix rag.install gives you an example of how to combine both.

In this guide we will perform a lookup in our weather_data. For the sake of simplicity, the user input must follow a strict format:

What was the weather like in [city] on [date]?

city_input = Kino.Input.text("city", default: "Berlin")
date_input = Kino.Input.date("date", default: ~D[2025-03-13])

import Kino.Shorts

grid([text("What was the weather like in"), city_input, text("on"), date_input, text("?")])
city = Kino.Input.read(city_input)
date = Kino.Input.read(date_input)

query = "What was the weather like in #{city} on #{date}?"
"What was the weather like in Berlin on 2025-03-13?"

Alright, we have a user query. Next, we need a function to retrieve weather data. A retrieval function in rag must take a Rag.Generation struct as argument and return {:ok, result} or {:error, error}.

city_and_date_from_query = fn query -> query
    |> String.trim_leading("What was the weather like in ")
    |> String.trim_trailing("?")
    |> String.split(" on ")
end

weather_by_city_and_date = fn generation ->
    case city_and_date_from_query.(generation.query) do
    [city, date] -> {:ok, weather_data[String.downcase(city)][Date.from_iso8601!(date)]}
    _else -> {:error, :bad_format}
  end
end
#Function<42.39164016/1 in :erl_eval.expr/6>

Let's test the function.

Rag.Generation.new(query) |> weather_by_city_and_date.()
{:ok, "Cloudy and around 8°C"}

Nice, we found something for Berlin on 2025-03-13.

Generation

While you can directly use your retrieval function and afterwards use Rag.Generation.put_retrieval_result/3 to store the result in the Rag.Generation struct, you'll get telemetry events when you pass it as a callback to Rag.Retrieval.retrieve/3.

generation =
  Rag.Generation.new(query)
  |> Rag.Retrieval.retrieve(:weather, &weather_by_city_and_date.(&1))
%Rag.Generation{
  query: "What was the weather like in Berlin on 2025-03-13?",
  query_embedding: nil,
  retrieval_results: %{weather: "Cloudy and around 8°C"},
  context: nil,
  context_sources: [],
  prompt: nil,
  response: nil,
  evaluations: %{},
  halted?: false,
  errors: []
}

Next, we take the retrieved information and construct a context for the LLM and store it in our Rag.Generation struct.

context =
  if weather = Rag.Generation.get_retrieval_result(generation, :weather) do
    [city, date] = city_and_date_from_query.(generation.query)
    "Weather in #{String.capitalize(city)} on #{date}: #{weather}"
  end

generation = Rag.Generation.put_context(generation, context)
%Rag.Generation{
  query: "What was the weather like in Berlin on 2025-03-13?",
  query_embedding: nil,
  retrieval_results: %{weather: "Cloudy and around 8°C"},
  context: "Weather in Berlin on 2025-03-13: Cloudy and around 8°C",
  context_sources: [],
  prompt: nil,
  response: nil,
  evaluations: %{},
  halted?: false,
  errors: []
}

Now, we construct a prompt that we will finally pass to the LLM.

prompt =
  if generation.context do
    """
    Context: #{generation.context}
    Question: #{generation.query}
    """
  else
    generation.query
  end

generation = Rag.Generation.put_prompt(generation, prompt)
%Rag.Generation{
  query: "What was the weather like in Berlin on 2025-03-13?",
  query_embedding: nil,
  retrieval_results: %{weather: "Cloudy and around 8°C"},
  context: "Weather in Berlin on 2025-03-13: Cloudy and around 8°C",
  context_sources: [],
  prompt: "Context: Weather in Berlin on 2025-03-13: Cloudy and around 8°C\nQuestion: What was the weather like in Berlin on 2025-03-13?\n",
  response: nil,
  evaluations: %{},
  halted?: false,
  errors: []
}

For the last step, generating a response, we must first configure a Rag.Ai.Provider. We'll use Rag.Ai.Cohere this time as you can get a free trial key.

If you're reading this in livebook, you can configure a secret COHERE_API_KEY.

api_key = System.get_env("LB_COHERE_API_KEY")

provider = Rag.Ai.Cohere.new(text_model: "command-r-plus", api_key: api_key)

Kino.nothing()

Finally, we can generate a response.

generation = Rag.Generation.generate_response(generation, provider)

generation.response
"The weather in Berlin on 2025-03-13 was cloudy, with temperatures hovering around 8°C."