README

A Phoenix-native workbench for comparing providers, tracking prompt history, and running regression suites.

Aludel gives teams a clean way to evaluate prompt and model behavior without inventing their own tooling first.

Compare the same prompt across OpenAI, Anthropic, Gemini, and Ollama.
Inspect output, latency, token usage, and cost side by side.
Version prompts and see how changes affect results over time.
Run evaluation suites with assertions and document attachments.
Route runs and suites through your app's real LLM workflow with callback execution.
Use it inside an existing Phoenix app or run it standalone.

Why Aludel

Most teams evaluating LLM behavior end up with some combination of scripts, spreadsheets, and ad hoc dashboards. Aludel brings that work into one place with a UI that is practical enough for day-to-day iteration.

Provider comparison: run the same input across models and vendors in one view.
Prompt history: keep prompt changes traceable instead of losing them in copy-pasted variants.
Regression coverage: turn important scenarios into repeatable suites with assertions.
Embedded app callbacks: evaluate your production-facing workflow without rebuilding it in the dashboard.
Phoenix-native deployment: mount it in your app or run it as a standalone dashboard.

Structured Output Scoring

Suites support strict string assertions and structured JSON checks.

For structured outputs, use json_deep_compare to score partial matches instead of forcing all-or-nothing pass/fail outcomes.

[
  {
    "type": "json_deep_compare",
    "expected": {
      "status": "ok",
      "customer": {
        "name": "Jane",
        "tier": "gold"
      }
    },
    "threshold": 75.0
  }
]

Aludel stores field-level comparison details, per-test match scores, and suite-run average scores so prompt evolution and exports can track structured output quality over time.

Quick Start

Embed in an existing Phoenix app

Requirements:

Elixir and Phoenix
PostgreSQL 12+

Aludel depends on PostgreSQL-specific features, including JSONB, percentile_disc(), and DATE()-based aggregations. SQLite and MySQL are not supported.

1. Add the dependency

def deps do
  [
    {:aludel, "~> 0.2"}
  ]
end

mix deps.get

2. Configure the repo

config :aludel, repo: YourApp.Repo

3. Install and run migrations

mix aludel.install
mix ecto.migrate

4. Mount the dashboard

use YourAppWeb, :router
import Aludel.Web.Router

if Mix.env() == :dev do
  scope "/dev" do
    pipe_through :browser
    aludel_dashboard "/aludel"
  end
end

5. Start using it

Visit your configured path, for example http://localhost:4000/dev/aludel.

Execution modes

Aludel supports two execution modes:

Native (default): Aludel renders the prompt template and calls the configured provider directly.
App Callback: your host app executes the real workflow and returns a normalized result back to Aludel.

Use callback mode when your production behavior includes orchestration beyond a single prompt, such as retrieval, tool usage, routing, retries, or post-processing.

Configure it in your embedded app:

config :aludel,
  execution_mode: :callback,
  executor: MyApp.AludelExecutor

Example executor:

defmodule MyApp.AludelExecutor do
  @behaviour Aludel.Executor

  @impl true
  def run(%{
        kind: kind,
        variables: variables,
        documents: documents,
        provider: provider,
        metadata: metadata
      }) do
    case MyApp.AI.reply(%{
           question: variables["question"],
           documents: documents,
           provider: provider && provider.provider,
           model: provider && provider.model,
           context: %{source: :aludel, kind: kind, metadata: metadata}
         }) do
      {:ok, reply} ->
        {:ok,
         %{
           output: reply.text,
           input_tokens: Map.get(reply, :input_tokens),
           output_tokens: Map.get(reply, :output_tokens),
           latency_ms: Map.get(reply, :latency_ms),
           cost_usd: Map.get(reply, :cost_usd),
           metadata: %{trace_id: Map.get(reply, :trace_id)}
         }}

      {:error, reason} ->
        {:error, reason}
    end
  end
end

Success responses only require output. input_tokens, output_tokens, latency_ms, cost_usd, and metadata are optional.

In callback mode, the existing run and suite UI stays the same:

provider selection still stays available
the run and suite screens show Execution Mode
missing token or cost metrics render as N/A
exports include callback metadata when present

Standalone mode

If you want to run Aludel by itself:

git clone https://github.com/ccarvalho-eng/aludel.git
cd aludel/standalone
mix deps.get
mix ecto.create
mix ecto.migrate
mix phx.server

To populate the local database with sample prompts, providers, and suites:

mix aludel.seed

Visit http://localhost:4000.

To smoke-test callback mode in the standalone app, configure a local executor module in standalone/lib/aludel_dash.ex or another module loaded by the standalone app, then add:

config :aludel,
  execution_mode: :callback,
  executor: AludelDash.Executor

After restarting mix phx.server, create a prompt version and provider in the UI, then:

Launch a run from /runs/new?version=<prompt_version_id>
Run a suite from /suites/<suite_id>
Confirm both screens show Execution Mode
Confirm the outputs come from your executor and optional metrics render cleanly when omitted

Provider support

Aludel supports OpenAI, Anthropic, Google Gemini, and Ollama.

Provider	API key required	Notes
OpenAI	Yes	Configure with `OPENAI_API_KEY`
Anthropic	Yes	Configure with `ANTHROPIC_API_KEY`
Google Gemini	Yes	Configure with `GOOGLE_API_KEY`
Ollama	No	Runs locally

For embedded apps, configure provider keys in config/runtime.exs:

# In config/runtime.exs
config :aludel, :llm,
  openai_api_key: System.get_env("OPENAI_API_KEY"),
  anthropic_api_key: System.get_env("ANTHROPIC_API_KEY"),
  google_api_key: System.get_env("GOOGLE_API_KEY")

Ollama runs locally and does not require an API key.

Callback mode does not require Aludel to use those API keys directly, but provider selection still remains part of the current run and suite flows and is passed into the executor for host-app routing when needed.

Document Storage

Uploaded test case documents go through Aludel.Storage. Documents can be attached while creating new suite test cases or while editing existing test cases.

Development uses the local filesystem adapter from config/dev.exs.
Production uses config/runtime.exs and requires ALUDEL_STORAGE_BACKEND.

Development storage

Development stores uploaded documents on the local filesystem.

Production storage

Set ALUDEL_STORAGE_BACKEND to aws or gcs.

For AWS S3:

export ALUDEL_STORAGE_BACKEND=aws
export AWS_S3_BUCKET=aludel-uploads
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...

For Google Cloud Storage:

export ALUDEL_STORAGE_BACKEND=gcs
export GCS_BUCKET=aludel-uploads
export GOOGLE_APPLICATION_CREDENTIALS=/absolute/path/to/service-account.json

If your GCS bucket requires requester-pays access, also set:

export GCS_USER_PROJECT=your-billing-project-id

The GCS adapter uses Goth with standard Google application credentials. GOOGLE_APPLICATION_CREDENTIALS_JSON also works if you prefer inline JSON.

Documentation

The README is intentionally optimized for first contact. For deeper setup, usage, and contribution details:

Development

For local development:

mix deps.get
mix compile
mix test
mix precommit

If you are changing frontend assets:

mix assets.build
mix compile --force

For standalone development, run the app from the standalone directory:

cd standalone
mix phx.server

If you change frontend assets, rebuild them from the repo root and restart the standalone server:

mix assets.build
mix compile --force

License

Apache License 2.0

Next Page → Contributing to Aludel