View Source Getting Started
Mix.install([
{:rag, "~> 0.2.0"},
{:kino, "~> 0.15.3"}
])
Basic Idea
The underlying idea of Retrieval Augmented Generation is that we provide an LLM with helpful context when we let it generate a response.
So, instead of starting the text generation from a plain question:
What was the weather like in Berlin on 2025-03-13?
We give it context and a question:
Context: Weather in Berlin on 2025-03-13: Cloudy and around 8°C
Question: What was the weather like in Berlin on 2025-03-13?
This way it's much easier for the LLM to respond with correct information (assuming the information in the context is correct).
The remainder of this guide represents a very basic example how we can implement this idea.
For a more sophisticated example, you can run the installer with mix rag.install
in an existing mix project and inspect the generated code.
Ingestion
To be able to provide helpful context at the right time, we first store information somewhere in a way that we can easily find relevant pieces. We call this process "ingestion".
Usually you will store the information in a suitable database. You might want to calculate embeddings of the information to perform semantic search. In that case, you can use Rag.Embedding.generate_embedding/3
or Rag.Embedding.generate_embeddings_batch/3
.
Generally speaking, rag
leaves the ingestion process up to you.
For this guide we will use a simple map to store some weather data.
weather_data = %{
"berlin" => %{
~D[2025-03-13] => "Cloudy and around 8°C"
}
}
%{"berlin" => %{~D[2025-03-13] => "Cloudy and around 8°C"}}
Retrieval
We want to work with user queries like:
What was the weather like in Berlin on 2025-03-13?
Our next step is to find relevant information.
Oftentimes, we would calculate an embedding of the query, and then use that to perform a semantic search for relevant data.
We could also perform a text based search.
The code generated by mix rag.install
gives you an example of how to combine both.
In this guide we will perform a lookup in our weather_data
.
For the sake of simplicity, the user input must follow a strict format:
What was the weather like in [city] on [date]?
city_input = Kino.Input.text("city", default: "Berlin")
date_input = Kino.Input.date("date", default: ~D[2025-03-13])
import Kino.Shorts
grid([text("What was the weather like in"), city_input, text("on"), date_input, text("?")])
city = Kino.Input.read(city_input)
date = Kino.Input.read(date_input)
query = "What was the weather like in #{city} on #{date}?"
"What was the weather like in Berlin on 2025-03-13?"
Alright, we have a user query.
Next, we need a function to retrieve weather data.
A retrieval function in rag
must take a Rag.Generation
struct as argument and return {:ok, result}
or {:error, error}
.
city_and_date_from_query = fn query -> query
|> String.trim_leading("What was the weather like in ")
|> String.trim_trailing("?")
|> String.split(" on ")
end
weather_by_city_and_date = fn generation ->
case city_and_date_from_query.(generation.query) do
[city, date] -> {:ok, weather_data[String.downcase(city)][Date.from_iso8601!(date)]}
_else -> {:error, :bad_format}
end
end
#Function<42.39164016/1 in :erl_eval.expr/6>
Let's test the function.
Rag.Generation.new(query) |> weather_by_city_and_date.()
{:ok, "Cloudy and around 8°C"}
Nice, we found something for Berlin on 2025-03-13.
Generation
While you can directly use your retrieval function and afterwards use Rag.Generation.put_retrieval_result/3
to store the result in the Rag.Generation
struct, you'll get telemetry events when you pass it as a callback to Rag.Retrieval.retrieve/3
.
generation =
Rag.Generation.new(query)
|> Rag.Retrieval.retrieve(:weather, &weather_by_city_and_date.(&1))
%Rag.Generation{
query: "What was the weather like in Berlin on 2025-03-13?",
query_embedding: nil,
retrieval_results: %{weather: "Cloudy and around 8°C"},
context: nil,
context_sources: [],
prompt: nil,
response: nil,
evaluations: %{},
halted?: false,
errors: []
}
Next, we take the retrieved information and construct a context for the LLM and store it in our Rag.Generation
struct.
context =
if weather = Rag.Generation.get_retrieval_result(generation, :weather) do
[city, date] = city_and_date_from_query.(generation.query)
"Weather in #{String.capitalize(city)} on #{date}: #{weather}"
end
generation = Rag.Generation.put_context(generation, context)
%Rag.Generation{
query: "What was the weather like in Berlin on 2025-03-13?",
query_embedding: nil,
retrieval_results: %{weather: "Cloudy and around 8°C"},
context: "Weather in Berlin on 2025-03-13: Cloudy and around 8°C",
context_sources: [],
prompt: nil,
response: nil,
evaluations: %{},
halted?: false,
errors: []
}
Now, we construct a prompt that we will finally pass to the LLM.
prompt =
if generation.context do
"""
Context: #{generation.context}
Question: #{generation.query}
"""
else
generation.query
end
generation = Rag.Generation.put_prompt(generation, prompt)
%Rag.Generation{
query: "What was the weather like in Berlin on 2025-03-13?",
query_embedding: nil,
retrieval_results: %{weather: "Cloudy and around 8°C"},
context: "Weather in Berlin on 2025-03-13: Cloudy and around 8°C",
context_sources: [],
prompt: "Context: Weather in Berlin on 2025-03-13: Cloudy and around 8°C\nQuestion: What was the weather like in Berlin on 2025-03-13?\n",
response: nil,
evaluations: %{},
halted?: false,
errors: []
}
For the last step, generating a response, we must first configure a Rag.Ai.Provider
.
We'll use Rag.Ai.Cohere
this time as you can get a free trial key.
If you're reading this in livebook, you can configure a secret COHERE_API_KEY
.
api_key = System.get_env("LB_COHERE_API_KEY")
provider = Rag.Ai.Cohere.new(text_model: "command-r-plus", api_key: api_key)
Kino.nothing()
Finally, we can generate a response.
generation = Rag.Generation.generate_response(generation, provider)
generation.response
"The weather in Berlin on 2025-03-13 was cloudy, with temperatures hovering around 8°C."