Synaptic.Eval.Integration behaviour (synaptic v0.2.6)

Behaviour for integrating with 3rd party eval services like Braintrust, LangSmith, etc.

Eval integrations observe LLM calls and scorer results via Telemetry events and can combine them into complete eval records for external services.

Example Implementation

defmodule MyApp.Eval.BraintrustIntegration do
  @behaviour Synaptic.Eval.Integration

  @impl Synaptic.Eval.Integration
  def on_llm_call(_event, measurements, metadata, config) do
    # Log LLM call with tokens, input, output
    usage = Map.get(metadata, :usage, %{})

    Braintrust.log({
      run_id: metadata.run_id,
      step: metadata.step_name,
      input: metadata.input,  # You'd need to capture this separately
      output: metadata.output,  # You'd need to capture this separately
      model: metadata.model,
      prompt_tokens: Map.get(usage, :prompt_tokens, 0),
      completion_tokens: Map.get(usage, :completion_tokens, 0),
      total_tokens: Map.get(usage, :total_tokens, 0),
      duration_ms: System.convert_time_unit(measurements.duration, :native, :millisecond)
    })
  end

  @impl Synaptic.Eval.Integration
  def on_scorer_result(_event, measurements, metadata, config) do
    # Log scorer result
    Braintrust.log_score({
      run_id: metadata.run_id,
      step: metadata.step_name,
      scorer: metadata.scorer,
      score: metadata.score,
      reason: metadata.reason
    })
  end
end

Attaching the Integration

Attach your integration in your application startup (e.g., in application.ex):

defmodule MyApp.Application do
  def start(_type, _args) do
    # ... other setup ...

    Synaptic.Eval.Integration.attach(MyApp.Eval.BraintrustIntegration, %{
      # Your config here
    })

    # ... rest of startup ...
  end
end

Combining LLM Metrics with Scorer Results

To combine LLM metrics with scorer results, you can:

Store LLM call data in a process dictionary or ETS table keyed by {run_id, step_name}
When scorer results arrive, look up the corresponding LLM call
Combine both into a single eval record

See the README for more detailed examples.

Summary

Callbacks

on_llm_call(event, measurements, metadata, config)

Called when an LLM call completes.

on_scorer_result(event, measurements, metadata, config)

Called when a scorer completes.

on_step_complete(event, measurements, metadata, config)

Called when a step completes.

Functions

attach(integration_module, config \\ %{})

Attaches Telemetry handlers for the given integration module.

detach(integration_module)

Detaches Telemetry handlers for the given integration module.

Callbacks

on_llm_call(event, measurements, metadata, config)

@callback on_llm_call(
  event :: [atom()],
  measurements :: map(),
  metadata :: map(),
  config :: term()
) :: :ok

Called when an LLM call completes.

This callback is invoked via a Telemetry handler attached to [:synaptic, :llm, :stop].

Parameters

event - The Telemetry event name (e.g., [:synaptic, :llm, :stop])
measurements - Map containing :duration and optionally token counts
metadata - Map containing:
- :run_id - Workflow run identifier
- :step_name - Step name (atom)
- :adapter - Adapter module
- :model - Model name
- :stream - Boolean indicating if streaming was used
- :usage - Optional usage map with :prompt_tokens, :completion_tokens, :total_tokens
config - Configuration passed to attach/2

on_scorer_result(event, measurements, metadata, config)

(optional)

@callback on_scorer_result(
  event :: [atom()],
  measurements :: map(),
  metadata :: map(),
  config :: term()
) :: :ok

Called when a scorer completes.

This callback is invoked via a Telemetry handler attached to [:synaptic, :scorer, :stop].

Parameters

event - The Telemetry event name (e.g., [:synaptic, :scorer, :stop])
measurements - Map containing :duration
metadata - Map containing:
- :run_id - Workflow run identifier
- :workflow - Workflow module
- :step_name - Step name (atom)
- :scorer - Scorer module
- :status - :ok or :error
- :score - Score value (number, or nil on error)
- :reason - Reason string (or error message)
config - Configuration passed to attach/2

on_step_complete(event, measurements, metadata, config)

(optional)

@callback on_step_complete(
  event :: [atom()],
  measurements :: map(),
  metadata :: map(),
  config :: term()
) :: :ok

Called when a step completes.

This optional callback can be used to combine LLM metrics with scorer results after a step finishes. It's invoked via a Telemetry handler attached to [:synaptic, :step, :stop].

Parameters

event - The Telemetry event name (e.g., [:synaptic, :step, :stop])
measurements - Map containing :duration
metadata - Map containing:
- :run_id - Workflow run identifier
- :workflow - Workflow module
- :step_name - Step name (atom)
- :type - Step type
- :status - :ok, :suspend, :error, or :unknown
config - Configuration passed to attach/2

Functions

attach(integration_module, config \\ %{})

@spec attach(module(), config :: term()) :: :ok

Attaches Telemetry handlers for the given integration module.

This function sets up Telemetry handlers that call the integration's callbacks when LLM calls, scorer results, or step completions occur.

Parameters

integration_module - Module implementing Synaptic.Eval.Integration
config - Configuration map passed to all callbacks

Example

Synaptic.Eval.Integration.attach(MyApp.Eval.BraintrustIntegration, %{
  api_key: System.get_env("BRAINTRUST_API_KEY"),
  project: "my-project"
})

detach(integration_module)

@spec detach(module()) :: :ok | {:error, :not_found}

Detaches Telemetry handlers for the given integration module.

Parameters

integration_module - Module implementing Synaptic.Eval.Integration