Nous.Eval.Optimizer (nous v0.13.3)

View Source

Optimization engine for finding optimal agent configurations.

The optimizer runs evaluation suites with different parameter combinations to find configurations that maximize performance metrics.

Supported Strategies

  • :grid_search - Exhaustive search over parameter grid
  • :bayesian - Bayesian optimization with TPE (Tree-structured Parzen Estimator)
  • :random - Random search over parameter space

Example

# Define parameter space
params = [
  Optimizer.Parameter.float(:temperature, 0.0, 1.0, step: 0.1),
  Optimizer.Parameter.integer(:max_tokens, 100, 1000, step: 100),
  Optimizer.Parameter.choice(:model, [
    "lmstudio:ministral-3-14b-reasoning",
    "lmstudio:qwen-7b"
  ])
]

# Run optimization
{:ok, result} = Optimizer.optimize(suite, params,
  strategy: :grid_search,
  metric: :score,
  maximize: true
)

IO.inspect(result.best_config)
IO.inspect(result.best_score)

Bayesian Optimization

For expensive evaluations, use Bayesian optimization which learns from previous trials to focus on promising regions:

{:ok, result} = Optimizer.optimize(suite, params,
  strategy: :bayesian,
  n_trials: 50,
  metric: :score
)

Metrics

Optimization can target different metrics:

  • :score - Aggregate evaluation score (default)
  • :pass_rate - Percentage of tests passing
  • :latency_p50 - Median latency
  • :latency_p95 - 95th percentile latency
  • :total_tokens - Token efficiency
  • :cost - Estimated cost

Summary

Functions

Extract a specific metric from evaluation result.

Run optimization to find best configuration.

Run a single trial with given configuration.

Types

metric()

@type metric() ::
  :score
  | :pass_rate
  | :latency_p50
  | :latency_p95
  | :latency_p99
  | :total_tokens
  | :cost

optimization_result()

@type optimization_result() :: %{
  best_config: map(),
  best_score: float(),
  all_trials: [trial()],
  total_trials: non_neg_integer(),
  duration_ms: non_neg_integer(),
  strategy: atom(),
  metric: atom(),
  avg_score: float(),
  std_score: float()
}

trial()

@type trial() :: %{
  config: map(),
  score: float(),
  metrics: map(),
  duration_ms: non_neg_integer()
}

Functions

extract_metric(result, arg2)

@spec extract_metric(map(), metric()) :: float()

Extract a specific metric from evaluation result.

optimize(suite, parameters, opts \\ [])

@spec optimize(Nous.Eval.Suite.t(), [Nous.Eval.Optimizer.Parameter.t()], keyword()) ::
  {:ok, optimization_result()} | {:error, term()}

Run optimization to find best configuration.

Options

  • :strategy - Optimization strategy (:grid_search, :bayesian, :random)
  • :metric - Metric to optimize (default: :score)
  • :maximize - Whether to maximize metric (default: true)
  • :n_trials - Max trials for bayesian/random (default: 100)
  • :timeout - Total timeout in ms (default: 3600000 = 1 hour)
  • :parallel - Run trials in parallel (default: false)
  • :early_stop - Stop if score reaches threshold
  • :verbose - Print progress (default: true)

Returns

{:ok, %{
  best_config: %{temperature: 0.3, max_tokens: 500},
  best_score: 0.95,
  all_trials: [...],
  total_trials: 50,
  duration_ms: 120000,
  strategy: :bayesian,
  metric: :score
}}

run_trial(suite, config, metric, opts \\ [])

@spec run_trial(Nous.Eval.Suite.t(), map(), atom(), keyword()) ::
  {:ok, trial()} | {:error, term()}

Run a single trial with given configuration.