Getting Started
View SourceThis guide is an introduction to Blink, a fast bulk data insertion library for Ecto and PostgreSQL.
In this guide, we will:
- Create a seeder module for inserting users and posts
- Learn how to reference data from previously declared tables
- Use streams for memory-efficient seeding
- Store auxiliary data in context without inserting it into the database
Adding Blink to an application
Add Blink to your dependencies in mix.exs:
defp deps do
[
{:blink, "~> 0.6.1"}
]
endInstall the dependencies:
mix deps.get
Configuring the repository
Blink works with any Ecto repository. If you don't have Ecto set up yet, follow the Ecto Getting Started guide to configure your repository and create your database tables.
For this guide, we'll assume you have:
- An Ecto repository (e.g.,
Blog.Repo) configured - A
userstable with columns:id,name,email,inserted_at,updated_at - A
poststable with columns:id,title,body,user_id,inserted_at,updated_at
Creating a seeder
Now that we have our database set up, let's create a seeder module to insert data:
defmodule Blog.Seeder do
use Blink
def call do
new()
|> with_table("users")
|> with_table("posts")
|> run(Blog.Repo)
end
def table(_seeder, "users") do
[
%{id: 1, name: "Alice", email: "alice@example.com"},
%{id: 2, name: "Bob", email: "bob@example.com"}
]
end
def table(seeder, "posts") do
IO.inspect(seeder)
# %Blink.Seeder{
# tables: %{"users" => [%{id: 1, name: "Alice", ...}, ...]},
# table_order: ["users"],
# table_opts: %{"users" => []},
# context: %{}
# }
users = seeder.tables["users"]
Enum.flat_map(users, fn user ->
for i <- 1..5 do
%{
id: (user.id - 1) * 5 + i,
title: "Post #{i} by #{user.name}",
body: "This is the content of post #{i}.",
user_id: user.id,
inserted_at: ~U[2024-01-01 00:00:00Z],
updated_at: ~U[2024-01-01 00:00:00Z]
}
end
end)
end
endThe seeder above does the following:
use Blink- Injects Blink's functions and defines required callbacksnew()- Creates an empty Seeder structwith_table/2- Declares the tables to insert rows intotable/2- Defines what rows to insert into each tablerun/2- Executes the bulk insertion
Each table/2 callback receives a Seeder struct. The tables field stores data from previously declared tables, allowing the "posts" callback to reference seeder.tables["users"].
Once run/2 is called, data is inserted in the order tables were declared. The context field is covered below.
Let's run it from IEx:
iex -S mix
iex> Blog.Seeder.call()
# => Inserts 2 users and 10 postsStreams
In the example above, the table/2 clauses returned lists. Since Blink stores the entire Seeder struct in memory, large lists can be problematic.
To avoid this, table/2 can return a stream instead:
def table(_seeder, "users") do
Stream.map(1..1_000_000, fn i ->
%{
id: i,
name: "User #{i}",
email: "user#{i}@example.com",
inserted_at: ~U[2024-01-01 00:00:00Z],
updated_at: ~U[2024-01-01 00:00:00Z]
}
end)
end
def table(seeder, "posts") do
Stream.flat_map(seeder.tables["users"], fn user ->
for i <- 1..20 do
%{
id: (user.id - 1) * 20 + i,
title: "Post #{i} by #{user.name}",
body: "This is the content of post #{i}",
user_id: user.id,
inserted_at: ~U[2024-01-01 00:00:00Z],
updated_at: ~U[2024-01-01 00:00:00Z]
}
end
end)
endStreams are processed lazily by run/2 without extra configuration needed.
Using context
Sometimes you need to compute data once and share it across multiple tables. Context data is not inserted into the database but is available when building your table data.
In this example, we generate timestamps once and reuse them across tables, ensuring posts are created after their author are.
def call do
new()
|> with_context("timestamps")
|> with_table("users")
|> with_table("posts")
|> run(Blog.Repo)
end
def context(_seeder, "timestamps") do
base = ~U[2024-01-01 00:00:00Z]
for day <- 0..29, do: DateTime.add(base, day, :day)
end
def table(seeder, "users") do
timestamps = seeder.context["timestamps"]
random_timestamp = Enum.random(timestamps)
for i <- 1..100 do
%{
id: i,
name: "User #{i}",
email: "user#{i}@example.com",
inserted_at: random_timestamp,
updated_at: random_timestamp
}
end
end
def table(seeder, "posts") do
users = seeder.tables["users"]
timestamps = seeder.context["timestamps"]
Enum.flat_map(users, fn user ->
# Only use timestamps after the user was created
valid_timestamps =
Enum.filter(timestamps, fn ts ->
DateTime.compare(ts, user.inserted_at) == :gt
end)
random_valid_timestamp = Enum.random(valid_timestamps)
for i <- 1..5 do
%{
id: (user.id - 1) * 5 + i,
title: "Post #{i}",
body: "Content here",
user_id: user.id,
inserted_at: random_valid_timestamp,
updated_at: random_valid_timestamp
}
end
end)
endSummary
In this guide, we learned how to:
- Create a seeder module with
use Blink - Reference data from previously declared tables via
seeder.tables - Use streams for memory-efficient seeding of large datasets
- Store auxiliary data in context without inserting it into the database
Next steps
You might also find these guides useful:
- Configuring Options - Set global and per-table options for batch size and concurrency
- Loading Data from Files - Learn how to load data from CSV and JSON files
- Integrating with ExMachina - Generate realistic test data