Changelog

View Source

All notable changes to this project will be documented in this file.

[0.9.0] - 2026-01-04

Added

  • Evaluation Framework: Production-grade testing and benchmarking for AI agents

  • Six Built-in Evaluators:

    • :exact_match - Strict string equality matching
    • :fuzzy_match - Jaro-Winkler similarity with configurable thresholds
    • :contains - Substring and regex pattern matching
    • :tool_usage - Tool call verification with argument validation
    • :schema - Ecto schema validation for structured outputs
    • :llm_judge - LLM-based quality assessment with custom rubrics
  • Optimization Engine: Automated parameter tuning for agents

    • Nous.Eval.Optimizer with three strategies: grid search, random search, Bayesian optimization
    • Support for float, integer, choice, and boolean parameter types
    • Early stopping on threshold achievement
    • Detailed trial history and best configuration reporting
  • New Mix Tasks:

    • mix nous.eval - Run evaluation suites with filtering, parallelism, and multiple output formats
    • mix nous.optimize - Parameter optimization with configurable strategies and metrics
  • New Dependency: yaml_elixir ~> 2.9 for YAML test suite parsing

Documentation

  • New comprehensive evaluation framework guide (docs/guides/evaluation.md)
  • Five new example scripts in examples/eval/:
    • 01_basic_evaluation.exs - Simple test execution
    • 02_yaml_suite.exs - Loading and running YAML suites
    • 03_optimization.exs - Parameter optimization workflows
    • 04_custom_evaluator.exs - Implementing custom evaluators
    • 05_ab_testing.exs - A/B testing configurations

[0.8.1] - 2025-12-31

Fixed

  • Fixed Usage struct not implementing Access behaviour for telemetry metrics
  • Fixed Task.shutdown/2 nil return case in AgentServer cancellation
  • Fixed tool call field access for OpenAI-compatible APIs (string vs atom keys)

Added

  • Vision/multimodal test suite with image fixtures (test/nous/vision_test.exs)
  • ContentPart test suite for image conversion utilities (test/nous/content_part_test.exs)
  • Multimodal message examples in conversation demo (examples/04_conversation.exs)

Changed

  • Updated docs to link examples to GitHub source files
  • Improved sidebar grouping in hexdocs

[0.8.0] - 2025-12-31

Added

  • Context Management: New Nous.Agent.Context struct for immutable conversation state, message history, and dependency injection. Supports context continuation between runs:

    {:ok, result1} = Nous.run(agent, "My name is Alice")
    {:ok, result2} = Nous.run(agent, "What's my name?", context: result1.context)
  • Agent Behaviour: New Nous.Agent.Behaviour for implementing custom agents with lifecycle callbacks (init_context/2, build_messages/2, process_response/3, extract_output/2).

  • Dual Callback System: New Nous.Agent.Callbacks supporting both map-based callbacks and process messages:

    # Map callbacks
    Nous.run(agent, "Hello", callbacks: %{
      on_llm_new_delta: fn _event, delta -> IO.write(delta) end
    })
    
    # Process messages (for LiveView)
    Nous.run(agent, "Hello", notify_pid: self())
  • Module-Based Tools: New Nous.Tool.Behaviour for defining tools as modules with metadata/0 and execute/2 callbacks. Use Nous.Tool.from_module/2 to create tools from modules.

  • Tool Context Updates: New Nous.Tool.ContextUpdate struct allowing tools to modify context state:

    def my_tool(ctx, args) do
      {:ok, result, ContextUpdate.new() |> ContextUpdate.set(:key, value)}
    end
  • Tool Testing Helpers: New Nous.Tool.Testing module with mock_tool/2, spy_tool/1, and test_context/1 for testing tool interactions.

  • Tool Validation: New Nous.Tool.Validator for JSON Schema validation of tool arguments.

  • Prompt Templates: New Nous.PromptTemplate for EEx-based prompt templates with variable substitution.

  • Built-in Agent Implementations: Nous.Agents.BasicAgent (default) and Nous.Agents.ReActAgent (reasoning with planning tools).

  • Structured Errors: New Nous.Errors module with MaxIterationsReached, ToolExecutionError, and ExecutionCancelled error types.

  • Enhanced Telemetry: New events for iterations (:iteration), tool timeouts (:tool_timeout), and context updates (:context_update).

Changed

  • Result Structure: Nous.run/3 now returns %{output: _, context: _, usage: _} instead of just output string.

  • Tool Function Signature: Tools now receive (ctx, args) instead of (args). The context provides access to ctx.deps for dependency injection.

  • Examples Modernized: Reduced from ~95 files to 21 files. Flattened directory structure from 4 levels to 2 levels. All examples updated to v0.8.0 API.

Removed

[0.7.2] - 2025-12-29

Fixed

  • Stream completion events: The [DONE] SSE event now properly emits a {:finish, "stop"} event instead of being silently discarded. This ensures stream consumers always receive a completion signal.

  • Documentation links: Fixed broken links in hexdocs documentation. Relative links to .exs example files now use absolute GitHub URLs so they work correctly on hexdocs.pm.

[0.7.1] - 2025-12-29

Changed

  • Make all provider dependencies optional: openai_ex, anthropix, and gemini_ex are now truly optional dependencies. Users only need to install the dependencies for the providers they use.

  • Runtime dependency checks: Provider modules now check for dependency availability at runtime instead of compile-time, allowing the library to compile without any provider-specific dependencies.

  • OpenAI message format: Messages are now returned as plain maps with string keys (%{"role" => "user", "content" => "Hi"}) instead of OpenaiEx.ChatMessage structs. This removes the compile-time dependency on openai_ex for message formatting.

Fixed

  • Fixed "anthropix dependency not available" errors that occurred when using the library in applications without anthropix installed.

  • Fixed compile-time errors that occurred when openai_ex was not present in the consuming application.

[0.7.0] - 2025-12-27

Initial public release with multi-provider LLM support:

  • OpenAI-compatible providers (OpenAI, Groq, OpenRouter, Ollama, LM Studio, vLLM)
  • Native Anthropic Claude support with extended thinking
  • Google Gemini support
  • Mistral AI support
  • Tool/function calling
  • Streaming support
  • ReAct agent implementation