Changelog

All notable changes to this project will be documented in this file.

[0.9.0] - 2026-01-04

Added

Evaluation Framework: Production-grade testing and benchmarking for AI agents
- Nous.Eval module for defining and running test suites
- Nous.Eval.Suite for test suite management with YAML support
- Nous.Eval.TestCase for individual test case definitions
- Nous.Eval.Runner for sequential and parallel test execution
- Nous.Eval.Metrics for collecting latency, token usage, and cost metrics
- Nous.Eval.Reporter for console and JSON result reporting
- A/B testing support with Nous.Eval.run_ab/2
Six Built-in Evaluators:
- :exact_match - Strict string equality matching
- :fuzzy_match - Jaro-Winkler similarity with configurable thresholds
- :contains - Substring and regex pattern matching
- :tool_usage - Tool call verification with argument validation
- :schema - Ecto schema validation for structured outputs
- :llm_judge - LLM-based quality assessment with custom rubrics
Optimization Engine: Automated parameter tuning for agents
- Nous.Eval.Optimizer with three strategies: grid search, random search, Bayesian optimization
- Support for float, integer, choice, and boolean parameter types
- Early stopping on threshold achievement
- Detailed trial history and best configuration reporting
New Mix Tasks:
- mix nous.eval - Run evaluation suites with filtering, parallelism, and multiple output formats
- mix nous.optimize - Parameter optimization with configurable strategies and metrics
New Dependency: yaml_elixir ~> 2.9 for YAML test suite parsing

Documentation

New comprehensive evaluation framework guide (docs/guides/evaluation.md)
Five new example scripts in examples/eval/:
- 01_basic_evaluation.exs - Simple test execution
- 02_yaml_suite.exs - Loading and running YAML suites
- 03_optimization.exs - Parameter optimization workflows
- 04_custom_evaluator.exs - Implementing custom evaluators
- 05_ab_testing.exs - A/B testing configurations

[0.8.1] - 2025-12-31

Fixed

Fixed Usage struct not implementing Access behaviour for telemetry metrics
Fixed Task.shutdown/2 nil return case in AgentServer cancellation
Fixed tool call field access for OpenAI-compatible APIs (string vs atom keys)

Added

Vision/multimodal test suite with image fixtures (test/nous/vision_test.exs)
ContentPart test suite for image conversion utilities (test/nous/content_part_test.exs)
Multimodal message examples in conversation demo (examples/04_conversation.exs)

Changed

Updated docs to link examples to GitHub source files
Improved sidebar grouping in hexdocs

[0.8.0] - 2025-12-31

Added

Context Management: New Nous.Agent.Context struct for immutable conversation state, message history, and dependency injection. Supports context continuation between runs:
```
{:ok, result1} = Nous.run(agent, "My name is Alice")
{:ok, result2} = Nous.run(agent, "What's my name?", context: result1.context)
```
Agent Behaviour: New Nous.Agent.Behaviour for implementing custom agents with lifecycle callbacks (init_context/2, build_messages/2, process_response/3, extract_output/2).

Dual Callback System: New Nous.Agent.Callbacks supporting both map-based callbacks and process messages:

# Map callbacks
Nous.run(agent, "Hello", callbacks: %{
  on_llm_new_delta: fn _event, delta -> IO.write(delta) end
})

# Process messages (for LiveView)
Nous.run(agent, "Hello", notify_pid: self())

Module-Based Tools: New Nous.Tool.Behaviour for defining tools as modules with metadata/0 and execute/2 callbacks. Use Nous.Tool.from_module/2 to create tools from modules.

Tool Context Updates: New Nous.Tool.ContextUpdate struct allowing tools to modify context state:

def my_tool(ctx, args) do
  {:ok, result, ContextUpdate.new() |> ContextUpdate.set(:key, value)}
end

Tool Testing Helpers: New Nous.Tool.Testing module with mock_tool/2, spy_tool/1, and test_context/1 for testing tool interactions.
Tool Validation: New Nous.Tool.Validator for JSON Schema validation of tool arguments.
Prompt Templates: New Nous.PromptTemplate for EEx-based prompt templates with variable substitution.
Built-in Agent Implementations: Nous.Agents.BasicAgent (default) and Nous.Agents.ReActAgent (reasoning with planning tools).
Structured Errors: New Nous.Errors module with MaxIterationsReached, ToolExecutionError, and ExecutionCancelled error types.
Enhanced Telemetry: New events for iterations (:iteration), tool timeouts (:tool_timeout), and context updates (:context_update).

Changed

Result Structure: Nous.run/3 now returns %{output: _, context: _, usage: _} instead of just output string.
Tool Function Signature: Tools now receive (ctx, args) instead of (args). The context provides access to ctx.deps for dependency injection.
Examples Modernized: Reduced from ~95 files to 21 files. Flattened directory structure from 4 levels to 2 levels. All examples updated to v0.8.0 API.

Removed

Removed deprecated provider modules: Nous.Providers.Gemini, Nous.Providers.Mistral, Nous.Providers.VLLM, Nous.Providers.SGLang.
Removed built-in tools: Nous.Tools.BraveSearch, Nous.Tools.DateTimeTools, Nous.Tools.StringTools, Nous.Tools.TodoTools. These can be implemented as custom tools.
Removed Nous.RunContext (replaced by Nous.Agent.Context).
Removed Nous.PromEx.Plugin (users can implement custom Prometheus metrics using telemetry events).

[0.7.2] - 2025-12-29

Fixed

Stream completion events: The [DONE] SSE event now properly emits a {:finish, "stop"} event instead of being silently discarded. This ensures stream consumers always receive a completion signal.
Documentation links: Fixed broken links in hexdocs documentation. Relative links to .exs example files now use absolute GitHub URLs so they work correctly on hexdocs.pm.

[0.7.1] - 2025-12-29

Changed

Make all provider dependencies optional: openai_ex, anthropix, and gemini_ex are now truly optional dependencies. Users only need to install the dependencies for the providers they use.
Runtime dependency checks: Provider modules now check for dependency availability at runtime instead of compile-time, allowing the library to compile without any provider-specific dependencies.
OpenAI message format: Messages are now returned as plain maps with string keys (%{"role" => "user", "content" => "Hi"}) instead of OpenaiEx.ChatMessage structs. This removes the compile-time dependency on openai_ex for message formatting.

Fixed

Fixed "anthropix dependency not available" errors that occurred when using the library in applications without anthropix installed.
Fixed compile-time errors that occurred when openai_ex was not present in the consuming application.

[0.7.0] - 2025-12-27

Initial public release with multi-provider LLM support:

OpenAI-compatible providers (OpenAI, Groq, OpenRouter, Ollama, LM Studio, vLLM)
Native Anthropic Claude support with extended thinking
Google Gemini support
Mistral AI support
Tool/function calling
Streaming support
ReAct agent implementation

← Previous Page README

Next Page → Getting Started Guide