Blink behaviour (blink v0.6.1)

Blink provides efficient database seeding with a clean, declarative syntax.

Example

defmodule MyApp.Seeder do
  use Blink

  def call do
    new()
    |> with_table("users")
    |> run(MyApp.Repo)
  end

  def table(_seeder, "users") do
    [
      %{id: 1, name: "Alice", email: "alice@example.com"},
      %{id: 2, name: "Bob", email: "bob@example.com"}
    ]
  end
end

Overview

Blink simplifies database seeding by providing a structured way to build and insert rows:

Create an empty Seeder with new/0.
Declare which tables to seed with with_table/2.
Define table/2 clauses that return the rows to insert.
Run run/2 or run/3 to bulk-insert the rows.

Seeders

Seeders are the central data unit in Blink. A Seeder is a struct that holds the rows you want to seed, any contextual data you need during the seeding process, and internal state that Blink uses to execute the bulk insert.

%Blink.Seeder{
  tables: %{
    "table_name" => [...]
  },
  context: %{
    "key" => [...]
  },
  table_order: ...,
  table_opts: ...
}

All keys in tables must match the name of a table in your database. Table names can be either atoms or strings.

Tables

A mapping of table names to lists of rows. These rows will be persisted to the database when run/2 or run/3 is called.

Context

Stores arbitrary data needed during the seeding process. This data is available when building your seeds but is not inserted into the database by run/2 or run/3. Use with_context/2 to declare context keys and define corresponding context/2 clauses.

Custom Logic for Running the Seeder

By default, run/2 and run/3 bulk insert rows from the seeder into the tables of a Postgres database. Internally they use Postgres' COPY command.

There are two ways to customize the insert behavior:

Override the default implementation of run/2 or run/3
Pass a custom adapter to run/3 (e.g., for non-Postgres databases)

Summary

Callbacks

context(seeder, key)

Builds and returns the data to be stored under a context key in the given Seeder.

run(seeder, repo)

Specifies how to run the Seeder, performing a bulk insert of the seed data from a Seeder into the given Ecto repository.

run(seeder, repo, opts)

table(seeder, table_name)

Builds and returns the rows to be stored under a table key in the given Seeder.

Functions

copy_to_table(rows, table_name, repo, opts \\ [])

Copies rows into a database table using database-specific bulk copy commands.

from_csv(path, opts \\ [])

Reads a CSV file and returns a list or stream of maps.

from_json(path, opts \\ [])

Reads a JSON file and returns a list of maps.

Callbacks

context(seeder, key)

(optional)

@callback context(seeder :: Blink.Seeder.t(), key :: Blink.Seeder.key()) :: Enumerable.t()

Builds and returns the data to be stored under a context key in the given Seeder.

Called internally by with_context/2. Each key passed to with_context must have a corresponding context/2 clause.

run/2 and run/3 ignore context data and only insert data from :tables.

When the callback function is missing, an ArgumentError is raised.

run(seeder, repo)

(optional)

@callback run(seeder :: Blink.Seeder.t(), repo :: Ecto.Repo.t()) :: :ok

Specifies how to run the Seeder, performing a bulk insert of the seed data from a Seeder into the given Ecto repository.

This callback function is optional, since Blink ships with a default implementation.

run(seeder, repo, opts)

(optional)

@callback run(seeder :: Blink.Seeder.t(), repo :: Ecto.Repo.t(), opts :: Keyword.t()) ::
  :ok

table(seeder, table_name)

(optional)

@callback table(seeder :: Blink.Seeder.t(), table_name :: Blink.Seeder.key()) ::
  Enumerable.t()

Builds and returns the rows to be stored under a table key in the given Seeder.

Called internally by with_table/2 and with_table/3. Each table name passed to with_table must have a corresponding table/2 clause.

Data added to a Seeder with table/2 is inserted into the corresponding database table when calling run/2 or run/3.

The callback can return either a list or a stream of maps. Returning a stream enables memory-efficient seeding of large datasets.

When the callback function is missing, an ArgumentError is raised.

Functions

copy_to_table(rows, table_name, repo, opts \\ [])

@spec copy_to_table(
  rows :: Enumerable.t(),
  table_name :: String.t(),
  repo :: Ecto.Repo.t(),
  opts :: Keyword.t()
) :: :ok

Copies rows into a database table using database-specific bulk copy commands.

Parameters

rows - An enumerable (list or stream) of maps where each map represents a row to insert. All maps must have the same keys, which correspond to the table columns. Using a stream allows for memory-efficient seeding of large datasets.
table_name - The name of the table to insert into (string or atom).
repo - An Ecto repository module.
opts - Keyword list of options:
- :adapter - The adapter module to use. Defaults to Blink.Adapter.Postgres.
The following options are specific to Blink.Adapter.Postgres:
- :batch_size - Number of rows per batch (default: 8,000).
- :max_concurrency - Number of parallel COPY operations (default: 6).
- :timeout - Timeout in milliseconds for each batch operation (default: :infinity).

Returns

:ok - When the copy operation succeeds

Raises an exception when the copy operation fails.

Examples

iex> rows = [%{id: 1, name: "Alice"}, %{id: 2, name: "Bob"}]
iex> copy_to_table(rows, "users", MyApp.Repo)
:ok

# Using a stream for memory-efficient seeding
iex> stream = Stream.map(1..1_000_000, fn i -> %{id: i, name: "User #{i}"} end)
iex> copy_to_table(stream, "users", MyApp.Repo)
:ok

Notes

The function assumes all rows have the same structure. Column names are extracted from the first row in the enumerable.

Currently only PostgreSQL is supported via Blink.Adapter.Postgres.

from_csv(path, opts \\ [])

@spec from_csv(path :: String.t(), opts :: Keyword.t()) :: Enumerable.t()

Reads a CSV file and returns a list or stream of maps.

Each column header becomes a string key in the resulting maps. All values are returned as strings.

Parameters

path - Path to the CSV file (relative or absolute)
opts - Keyword list of options:
- :headers - List of header names to use, or :infer to read from the first row (default: :infer)
- :transform - Function to transform each row map (default: identity)
- :stream - When true, returns a stream instead of a list (default: false)

Examples

# Read CSV with headers in first row
from_csv("users.csv")

# Provide headers explicitly
from_csv("users.csv", headers: ["id", "name", "email"])

# Transform values
from_csv("users.csv", transform: fn row ->
  Map.update!(row, "id", &String.to_integer/1)
end)

# Stream for memory-efficient processing
from_csv("large_users.csv", stream: true)

Returns

A list of maps, or a stream of maps when stream: true.

Notes

For JSONB columns, use :transform to parse JSON strings into maps. The Postgres adapter will automatically JSON-encode maps when inserting.

from_json(path, opts \\ [])

@spec from_json(path :: String.t(), opts :: Keyword.t()) :: [map()]

Reads a JSON file and returns a list of maps.

The JSON file must contain an array of objects at the root level. Each object becomes a map with string keys.

Parameters

path - Path to the JSON file
opts - Keyword list of options:
- :transform - Function to transform each row map (default: identity)

Examples

# Read JSON file
from_json("users.json")

# Transform values
from_json("users.json", transform: fn row ->
  Map.update!(row, "id", &String.to_integer/1)
end)

Returns

A list of maps.