Changelog

View Source

All notable changes to this project will be documented in this file.

[0.13.1] - 2026-04-03

Added

  • Nous.Transcript — Lightweight conversation compaction without LLM calls.

    • compact/2 — keep last N messages, summarize older ones into a system message
    • maybe_compact/2 — auto-compact based on message count (:every), token budget (:token_budget), or percentage threshold (:threshold)
    • compact_async/2 and compact_async/3 — background compaction via Nous.TaskSupervisor
    • maybe_compact_async/3 — background auto-compact with {:compacted, msgs} / {:unchanged, msgs} callbacks
    • estimate_tokens/1 and estimate_messages_tokens/1 — word-count-based token estimation
  • Built-in Coding Tools — 6 tools implementing Nous.Tool.Behaviour for coding agents:

  • Nous.Permissions — Tool-level permission policy engine complementing InputGuard:

    • Three presets: default_policy/0, permissive_policy/0, strict_policy/0
    • build_policy/1 — custom policies with :deny, :deny_prefixes, :approval_required
    • blocked?/2, requires_approval?/2 — case-insensitive tool name checking
    • filter_tools/2, partition_tools/2 — filter tool lists through policies
  • Nous.Session.Config and Nous.Session.Guardrails — session-level turn limits and token budgets:

    • Config struct with max_turns, max_budget_tokens, compact_after_turns
    • Guardrails.check_limits/4 — returns :ok or {:error, :max_turns_reached | :max_budget_reached}

    • Guardrails.remaining/4, Guardrails.summary/4 — budget tracking and reporting

Fixed

  • Empty stream silent failure: run_stream now emits {:error, :empty_stream} + warning when a provider returns zero events (e.g. minimax), instead of silently yielding {:complete, %{output: ""}}.
  • Memory.Search crash on vector search error: {:ok, results} = store_mod.search_vector(...) pattern match replaced with case — logs warning and returns empty list on error.
  • Atom table exhaustion in skill loader: String.to_atom/1 replaced with String.to_existing_atom/1 + rescue fallback with debug logging.
  • Context deserialization crash on unknown roles: String.to_existing_atom/1 replaced with explicit role whitelist (:system, :user, :assistant, :tool), defaults to :user with warning.
  • Unbounded inspect in stream normalizer: inspect(chunk, limit: :infinity) capped to limit: 500, printable_limit: 1000.
  • SQLite embedding decode crash: JSON.decode!/1 wrapped in rescue, returns nil with warning on malformed data.
  • Muninn bare rescue: rescue _ -> replaced with specific exception types (MatchError, File.Error, ErlangError, RuntimeError).

Documentation

  • Memory System Guide (docs/guides/memory.md) — 630+ line walkthrough covering all 6 store backends, search/scoring, BM25, agent integration, and cross-agent memory sharing.
  • Context & Dependencies Guide (docs/guides/context.md) — RunContext, ContextUpdate operations, stateful agent walkthrough, multi-user patterns.
  • Skills Guide enhanced — added 400+ lines: module-based and file-based skill walkthroughs, skill groups, activation modes, plugin configuration.
  • LiveView examples — chat interface (liveview_chat.exs) and multi-agent dashboard (liveview_multi_agent.exs) reference implementations.
  • PostgreSQL memory example (postgresql_full.exs) — end-to-end Store implementation with tsvector + pgvector, BM25 search, hybrid RRF search.
  • Coding agent example (19_coding_agent.exs) — permissions, tools, guardrails, and transcript compaction.
  • Tool permissions example (tool_permissions.exs) — policy presets, custom deny lists, tool filtering.

[0.13.0] - 2026-03-28

Added

  • Nous.Workflow — DAG/graph-based workflow engine for orchestrating agents, tools, and control flow as executable directed graphs. Complements Decisions (reasoning tracking) and Teams (persistent agent groups).
    • Builder API: Ecto.Multi-style pipes — Workflow.new/1 |> add_node/4 |> connect/3 |> chain/2 |> run/2
    • 8 node types: :agent_step, :tool_step, :transform, :branch, :parallel, :parallel_map, :human_checkpoint, :subworkflow
    • Hand-rolled graph: dual adjacency maps, Kahn's algorithm for topological sort + cycle detection + parallel execution levels in one O(V+E) pass
    • Static parallel: named branches fan-out concurrently via Task.Supervisor
    • Dynamic parallel_map: runtime fan-out over data lists with max_concurrency throttling — the scatter-gather pattern
    • Cycle support: edge-following execution with per-node max-iteration guards for retry/quality-gate loops
    • Workflow hooks: :pre_node, :post_node, :workflow_start, :workflow_end — integrates with existing Nous.Hook struct
    • Pause/resume: via hook ({:pause, reason}), :atomics external signal, or :human_checkpoint auto-suspend
    • Error strategies: :fail_fast, :skip, {:retry, max, delay}, {:fallback, node_id} per node
    • Telemetry: [:nous, :workflow, :run|:node, :start|:stop|:exception] events
    • Execution tracing: opt-in per-node timing and status recording (trace: true)
    • Checkpointing: Checkpoint struct + Store behaviour + ETS backend
    • Subworkflows: nested workflow invocation with input_mapper/output_mapper for data isolation
    • Runtime graph mutation: on_node_complete callback, Graph.insert_after/6, Graph.remove_node/2
    • Mermaid visualization: Workflow.to_mermaid/1 generates flowchart diagrams with type-specific node shapes
    • Scratch ETS: optional per-workflow ETS table for large/binary data exchange between steps
    • 113 new tests covering all workflow features

[0.12.17] - 2026-03-28

Removed

  • Dead module Nous.Decisions.Tools: 4 tool functions never used by any plugin or code path.
  • Dead module Nous.StreamNormalizer.Mistral: Mistral provider uses the default OpenAI-compatible normalizer.
  • Dead function emit_fallback_exhausted/3 in Fallback module: Defined but never called.
  • Dead config enable_telemetry: Set in config files but never read — telemetry is always on.
  • Dead config log_level: Set in dev/test configs but never read by Nous.
  • Unused test fixtures: NousTest.Fixtures.LLMResponses and its generator script (generated Oct 2025, never imported).

Fixed

  • Compiler warning in output_schema.ex: Removed always-truthy conditional around to_json_schema/1 return value.

Changed

  • All JSON encoding/decoding uses built-in JSON module instead of Jason. Jason removed from direct dependencies.
  • Added pretty_encode!/1 helper to internal JSON module for pretty-printed JSON output (used in LLM prompts and eval reports).
  • Updated README with Elixir 1.18+ / OTP 27+ requirements.

[0.12.16] - 2026-03-28

Fixed

  • Anthropic multimodal messages silently lost image data: message_to_anthropic/1 matched on content being a list, but Message.user/2 stores content parts in metadata.content_parts as a string. Multimodal messages were sent as plain text, losing all image data. Now reads from metadata like the OpenAI formatter.
  • Gemini multimodal messages had the same issue: Same pattern match bug caused all image content to be dropped.
  • Anthropic image format incorrect: The data field contained the full data URL prefix (data:image/jpeg;base64,...) instead of raw base64; media_type was hardcoded to "image/jpeg" regardless of actual format; HTTP URLs were incorrectly wrapped as base64 source instead of "type": "url".
  • Gemini had no image support: All non-text content parts fell through to a [Image: ...] text representation. Now uses inlineData for base64 images and fileData for HTTP URLs.
  • Anthropic duplicate thinking block: Assistant messages with reasoning content emitted the thinking block twice.

Added

  • ContentPart.parse_data_url/1 — extract MIME type and raw base64 data from a data URL string.
  • ContentPart.data_url?/1 and ContentPart.http_url?/1 — URL type predicates.
  • OpenAI formatter: :image content type support (converts to data URL) and detail option passthrough for image_url parts.
  • Comprehensive vision test pipeline (test/nous/vision_pipeline_test.exs) with 19 unit tests covering format conversion across all providers and 4 LLM integration tests.
  • Test fixture images: test_square.png (100x100 red), test_tiny.webp (minimal WebP).

[0.12.15] - 2026-03-26

Fixed

  • receive_timeout silently dropped in Nous.LLM: generate_text/3 and stream_text/3 with a string model only passed [:base_url, :api_key, :llamacpp_model] to Model.parse, so receive_timeout was silently ignored. Now correctly forwarded.

Removed

  • Dead timeout config: Removed unused default_timeout and stream_timeout from config/config.exs. Timeouts are determined by per-provider defaults in Model.default_receive_timeout/1 and each provider module's @default_timeout/@streaming_timeout constants.

Documentation

  • Added "Timeouts" section to README documenting receive_timeout option and default timeouts per provider.

[0.13.0] - 2026-03-21

Added

  • Hooks system: Granular lifecycle interceptors for tool execution and request/response flow.

    • 6 lifecycle events: pre_tool_use, post_tool_use, pre_request, post_response, session_start, session_end
    • 3 handler types: :function (inline), :module (behaviour), :command (shell via NetRunner)
    • Matcher-based dispatch: string (exact tool name), regex, or predicate function
    • Blocking semantics for pre_tool_use and pre_request — hooks can deny or modify tool calls
    • Priority-based execution ordering (lower = earlier)
    • Telemetry events: [:nous, :hook, :execute, :start | :stop], [:nous, :hook, :denied]

    • Nous.Hook, Nous.Hook.Registry, Nous.Hook.Runner
    • New option on Nous.Agent.new/2: :hooks
    • New example: examples/16_hooks.exs
  • Skills system: Reusable instruction/capability packages for agents.

    • Module-based skills with use Nous.Skill macro and behaviour callbacks
    • File-based skills: markdown files with YAML frontmatter, loaded from directories
    • 5 activation modes: :manual, :auto, {:on_match, fn}, {:on_tag, tags}, {:on_glob, patterns}
    • Skill groups: :coding, :review, :testing, :debug, :git, :docs, :planning
    • Registry with load/unload, activate/deactivate, group operations, and input matching
    • Nous.Plugins.Skills — auto-included plugin bridging skills into the agent lifecycle
    • Directory scanning: skill_dirs: option and Nous.Skill.Registry.register_directory/2
    • Telemetry events: [:nous, :skill, :activate | :deactivate | :load | :match]

    • New options on Nous.Agent.new/2: :skills, :skill_dirs
    • New example: examples/17_skills.exs
    • New guides: docs/guides/skills.md, docs/guides/hooks.md
  • 21 built-in skills:

    • Language-agnostic (10): CodeReview, TestGen, Debug, Refactor, ExplainCode, CommitMessage, DocGen, SecurityScan, Architect, TaskBreakdown
    • Elixir-specific (5): PhoenixLiveView, EctoPatterns, OtpPatterns, ElixirTesting, ElixirIdioms
    • Python-specific (6): PythonFastAPI, PythonTesting, PythonTyping, PythonDataScience, PythonSecurity, PythonUv
  • NetRunner dependency (~> 1.0.4): Zero-zombie-process OS command execution for command hooks with SIGTERM→SIGKILL timeout escalation.

  • 76 new tests for hooks and skills systems.

[0.12.11] - 2026-03-19

Added

  • Per-run structured output override: Pass output_type: and structured_output: as options to Nous.Agent.run/3 and Nous.Agent.run_stream/3 to override the agent's defaults per call. The same agent can return raw text or structured data depending on the request.
  • Multi-schema selection ({:one_of, [SchemaA, SchemaB]}): New output_type variant where the LLM dynamically chooses which schema to use per response. Each schema becomes a synthetic tool — the LLM's tool choice acts as schema selection. Includes automatic retry and validation against the selected schema.
    • OutputSchema.schema_name/1 — public helper to get snake_case name for a schema module
    • OutputSchema.tool_name_for_schema/1 — build synthetic tool name from schema module
    • OutputSchema.find_schema_for_tool_name/2 — reverse-map tool name to schema module
    • OutputSchema.synthetic_tool_name?/1 — predicate for synthetic tool call detection
    • OutputSchema.extract_response_for_one_of/2 — extract text and identify matched schema from tool call
    • New example: Example 6 (per-run override) and Example 7 (multi-schema) in examples/14_structured_output.exs
    • New sections in docs/guides/structured_output.md

Fixed

  • Synthetic tool call handling: Structured output tool calls (__structured_output__) in :tool_call mode are now correctly filtered from the tool execution loop. Previously, these synthetic calls would produce "Tool not found" errors and cause an unnecessary extra LLM round-trip. Now they terminate the loop immediately and the structured output is extracted directly.

[0.12.10] - 2026-03-19

Added

  • Fallback model/provider support: Automatic failover to alternative models when the primary model fails with a ProviderError or ModelError (rate limit, server error, timeout, auth issue).
    • Nous.Fallback — core fallback logic: eligibility checks, recursive model chain traversal, model string/struct parsing
    • :fallback option on Nous.Agent.new/2 — ordered list of fallback model strings or Model structs
    • :fallback option on Nous.generate_text/3 and Nous.stream_text/3
    • Tool schemas are automatically re-converted when falling back across providers (e.g., OpenAI → Anthropic)
    • Structured output settings are re-injected for the target provider on cross-provider fallback
    • Agent model is swapped on successful fallback so remaining iterations use the working model
    • Streaming fallback retries stream initialization only, not mid-stream failures
    • New telemetry events: [:nous, :fallback, :activated] and [:nous, :fallback, :exhausted]
    • Only ProviderError and ModelError trigger fallback; application-level errors (ValidationError, MaxIterationsExceeded, ExecutionCancelled, ToolError) are returned immediately
    • 52 new tests across test/nous/fallback_test.exs and test/nous/agent_fallback_test.exs

Changed

  • Nous.Agent struct gains fallback: [Model.t()] field (default: [])
  • Nous.LLM now uses injectable dispatcher (get_dispatcher/0) for testability, consistent with AgentRunner

[0.12.9] - 2026-03-12

Added

  • InputGuard plugin: Modular malicious input classifier with pluggable strategy pattern. Detects prompt injection, jailbreak attempts, and other malicious inputs before they reach the LLM.

Fixed

  • AgentRunner: before_request plugin hook now short-circuits the LLM call when a plugin sets needs_response: false (e.g., InputGuard blocking). Previously the current iteration would still call the LLM before the block took effect on the next iteration.

[0.12.8] - 2026-03-12

Fixed

  • Vertex AI v1/v1beta1 bug: Model.parse("vertex_ai:gemini-2.5-pro-preview-06-05") with GOOGLE_CLOUD_PROJECT set was storing a hardcoded v1 URL in model.base_url, causing the provider's v1beta1 selection logic to be bypassed. Preview models now correctly use v1beta1 at request time.

Added

  • Vertex AI input validation: Project ID and region from environment variables are now validated with helpful error messages instead of producing opaque DNS/HTTP errors.
  • GOOGLE_CLOUD_LOCATION support: Added as a fallback for GOOGLE_CLOUD_REGION, consistent with other Google Cloud libraries and tooling.
  • Multi-region example script: examples/providers/vertex_ai_multi_region.exs

[0.12.7] - 2026-03-10

Fixed

  • Vertex AI model routing: Fixed build_request_params/3 not including the "model" key in the params map, causing chat/2 and chat_stream/2 to always fall back to "gemini-2.0-flash" regardless of the requested model.
  • Vertex AI 404 on preview models: Use v1beta1 API version for preview and experimental models (e.g., gemini-3.1-pro-preview). The v1 endpoint returns 404 for these models.

Added

[0.12.6] - 2026-03-07

Added

  • Auto-update memory: Nous.Plugins.Memory can now automatically reflect on conversations and update memories after each run — no explicit tool calls needed. Enable with auto_update_memory: true in memory_config. Configurable reflection model, frequency, and context limits.
    • New after_run/3 callback in Nous.Plugin behaviour — runs once after the entire agent run completes. Wired into both AgentRunner.run/3 and run_with_context/3.
    • Nous.Plugin.run_after_run/4 helper for executing the hook across all plugins
    • New config options: :auto_update_memory, :auto_update_every, :reflection_model, :reflection_max_tokens, :reflection_max_messages, :reflection_max_memories
    • New example: examples/memory/auto_update.exs

[0.12.5] - 2026-03-06

Added

  • Vertex AI provider: Nous.Providers.VertexAI for accessing Gemini models through Google Cloud Vertex AI. Supports enterprise features (VPC-SC, CMEK, regional endpoints, IAM).
    • Three auth modes: app config Goth (config :nous, :vertex_ai, goth: MyApp.Goth), per-model Goth (default_settings: %{goth: MyApp.Goth}), or direct access token (api_key / VERTEX_AI_ACCESS_TOKEN)
    • Bearer token auth via api_key option, VERTEX_AI_ACCESS_TOKEN env var, or Goth integration
    • Goth integration ({:goth, "~> 1.4", optional: true}) for automatic service account token management — reuse existing Goth processes from PubSub, etc.
    • URL auto-construction from GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_REGION env vars
    • Nous.Providers.VertexAI.endpoint/2 helper to build endpoint URLs
    • Reuses existing Gemini message format, response parsing, and stream normalization
    • Model string: "vertex_ai:gemini-2.0-flash"

[0.12.2] - 2026-03-04

Fixed

  • Gemini streaming: Fixed streaming responses returning 0 events. The Gemini streamGenerateContent endpoint returns a JSON array (application/json) by default, not Server-Sent Events. Instead of forcing SSE via alt=sse query parameter, added a pluggable stream parser to Nous.Providers.HTTP.

Added

  • Nous.Providers.HTTP.JSONArrayParser — stream buffer parser for JSON array responses. Extracts complete JSON objects from a streaming [{...},{...},...] response by tracking {} nesting depth while respecting string literals and escape sequences.
  • :stream_parser option on HTTP.stream/4 — accepts any module implementing parse_buffer/1 with the same {events, remaining_buffer} contract as SSE parsing. Defaults to the existing SSE parser. Enables any provider with a non-SSE streaming format to plug in a custom parser.

[0.12.0] - 2026-02-28

Added

  • Memory System: Persistent memory for agents with hybrid text + vector search, temporal decay, importance weighting, and flexible scoping.

  • Graceful degradation: No embedding provider = keyword-only search. No optional deps = Store.ETS with Jaro matching. The core memory system has zero additional dependencies.

[0.11.3] - 2026-02-26

Fixed

Added

  • Nous.StreamNormalizer.Anthropic — normalizes Anthropic SSE events (content_block_delta, message_delta, content_block_start for tool use, thinking deltas, error events)
  • Nous.StreamNormalizer.Gemini — normalizes Gemini SSE events (candidates array with text parts, functionCall, finishReason mapping)
  • 42 tests for both new stream normalizers

[0.11.0] - 2026-02-20

Added

  • Structured Output Mode: Agents return validated, typed data instead of raw strings. Inspired by instructor_ex.

    • Nous.OutputSchema core module: JSON schema generation, provider settings dispatch, parsing and validation
    • use Nous.OutputSchema macro with @llm_doc attribute for schema-level LLM documentation
    • validate_changeset/1 optional callback for custom Ecto validation rules
    • Validation retry loop: failed outputs are sent back to the LLM with error details (max_retries option)
    • System prompt augmentation with schema instructions
  • Output Type Variants:

    • Ecto schema modules — full JSON schema + changeset validation
    • Schemaless Ecto types (%{name: :string, age: :integer}) — lightweight, no module needed
    • Raw JSON schema maps (string keys) — passed through as-is
    • {:regex, pattern} — regex-constrained output (vLLM/SGLang)
    • {:grammar, ebnf} — EBNF grammar-constrained output (vLLM)
    • {:choice, choices} — choice-constrained output (vLLM/SGLang)
  • Provider Modes: Controls how structured output is enforced per-provider

    • :auto (default) — picks best mode for the provider
    • :json_schemaresponse_format with strict JSON schema (OpenAI, vLLM, SGLang, Gemini)
    • :tool_call — synthetic tool with tool_choice (Anthropic default)
    • :jsonresponse_format: json_object (OpenAI-compatible)
    • :md_json — prompt-only enforcement with markdown fence + stop token (all providers)
  • Provider Passthrough: response_format, guided_json, guided_regex, guided_grammar, guided_choice, json_schema, regex, generationConfig now passed through in build_request_params

  • New Files:

    • lib/nous/output_schema.ex — core module
    • lib/nous/output_schema/validator.ex — behaviour definition
    • lib/nous/output_schema/use_macro.exuse Nous.OutputSchema macro
    • docs/guides/structured_output.md — comprehensive guide
    • examples/14_structured_output.exs — example script with 5 patterns
    • test/nous/output_schema_test.exs — 42 unit tests
    • test/nous/structured_output_integration_test.exs — 16 integration tests
    • test/eval/agents/structured_output_test.exs — 3 LLM integration tests

Changed

[0.10.1] - 2026-02-14

Changed

  • Sub-Agent plugin unified: Merged ParallelSubAgent into Nous.Plugins.SubAgent

    • Single plugin now provides both delegate_task (single) and spawn_agents (parallel) tools
    • system_prompt/2 callback injects orchestration guidance including available templates
    • Templates accept %Nous.Agent{} structs (recommended) or config maps (legacy)
    • Parallel execution via Task.Supervisor.async_stream_nolink
    • Configurable concurrency (parallel_max_concurrency, default: 5) and timeout (parallel_timeout, default: 120s)
    • Graceful partial failure: crashed/timed-out sub-agents don't block others
  • New Example: examples/13_sub_agents.exs

    • Template-based sub-agents using Nous.Agent.new/2 structs
    • Parallel execution with inline model config
    • Direct programmatic invocation bypassing the LLM

[0.10.0] - 2026-02-14

Added

  • Plugin System: Composable agent extensions via Nous.Plugin behaviour

    • Callbacks: init/2, tools/2, system_prompt/2, before_request/3, after_response/3
    • Add plugins: [MyPlugin] to any agent for cross-cutting concerns
    • AgentRunner iterates plugins at each stage of the execution loop
  • Human-in-the-Loop (HITL): Approval workflows for sensitive tool calls

  • Sub-Agent System: Enable agents to delegate tasks to specialized child agents

    • Nous.Plugins.SubAgent provides delegate_task tool
    • Pre-configured agent templates via deps[:sub_agent_templates]
    • Isolated context per sub-agent with shared deps support
  • Conversation Summarization: Automatic context window management

    • Nous.Plugins.Summarization monitors token usage against configurable threshold
    • LLM-powered summarization with safe split points (never separates tool_call/tool_result pairs)
    • Error-resilient: keeps all messages if summarization fails
  • State Persistence: Save and restore agent conversation state

  • Enhanced Supervision: Production lifecycle management for agents

    • Nous.AgentRegistry for session-based process lookup via Registry
    • Nous.AgentDynamicSupervisor for on-demand agent creation/destruction
    • Configurable inactivity timeout on AgentServer (default: 5 minutes)
    • Added to application supervision tree
  • Dangling Tool Call Recovery: Resilient session resumption

  • PubSub Abstraction Layer: Unified Nous.PubSub module for all PubSub usage

    • Nous.PubSub wraps Phoenix.PubSub with graceful no-op fallback when unavailable
    • Application-level configuration via config :nous, pubsub: MyApp.PubSub
    • Topic builders: agent_topic/1, research_topic/1, approval_topic/1
    • Nous.Agent.Context gains pubsub and pubsub_topic fields (runtime-only, never serialized)
    • Nous.Agent.Callbacks.execute/3 now broadcasts via PubSub as a third channel alongside callbacks and notify_pid
    • AgentServer refactored to use Nous.PubSub — removes ad-hoc setup_pubsub_functions/0 and subscribe_fn/broadcast_fn from state
    • Research Coordinator broadcasts progress via PubSub when :session_id is provided
    • SubAgent plugin propagates parent's PubSub context to child agents
  • Async HITL Approval via PubSub: Nous.PubSub.Approval module

    • handler/1 builds an approval handler compatible with Nous.Plugins.HumanInTheLoop
    • Broadcasts {:approval_required, info} and blocks via receive for response
    • respond/4 sends approval decisions from external processes (e.g., LiveView)
    • Configurable timeout with :reject as default on expiry
    • Enables async approval workflows without synchronous I/O
  • Deep Research Agent: Autonomous multi-step research with citations

  • New Research Tools:

  • New Dependencies:

    • floki ~> 0.36 (optional, for HTML content extraction)
    • phoenix_pubsub ~> 2.1 (test-only, for PubSub integration tests)

Changed

  • Nous.Agent struct now accepts plugins: [module()] option
  • Nous.Tool struct now accepts requires_approval: boolean() option
  • Nous.Agent.Context now includes approval_handler, pubsub, and pubsub_topic fields
  • Nous.AgentServer supports optional :name registration, :persistence backend, and uses Nous.PubSub (removed ad-hoc setup_pubsub_functions/0)
  • Nous.AgentServer :pubsub option now defaults to Nous.PubSub.configured_pubsub() instead of MyApp.PubSub
  • Nous.AgentRunner accepts :pubsub and :pubsub_topic options when building context
  • Application supervision tree includes AgentRegistry and AgentDynamicSupervisor

[0.9.0] - 2026-01-04

Added

  • Evaluation Framework: Production-grade testing and benchmarking for AI agents

  • Six Built-in Evaluators:

    • :exact_match - Strict string equality matching
    • :fuzzy_match - Jaro-Winkler similarity with configurable thresholds
    • :contains - Substring and regex pattern matching
    • :tool_usage - Tool call verification with argument validation
    • :schema - Ecto schema validation for structured outputs
    • :llm_judge - LLM-based quality assessment with custom rubrics
  • Optimization Engine: Automated parameter tuning for agents

    • Nous.Eval.Optimizer with three strategies: grid search, random search, Bayesian optimization
    • Support for float, integer, choice, and boolean parameter types
    • Early stopping on threshold achievement
    • Detailed trial history and best configuration reporting
  • New Mix Tasks:

    • mix nous.eval - Run evaluation suites with filtering, parallelism, and multiple output formats
    • mix nous.optimize - Parameter optimization with configurable strategies and metrics
  • New Dependency: yaml_elixir ~> 2.9 for YAML test suite parsing

Documentation

  • New comprehensive evaluation framework guide (docs/guides/evaluation.md)
  • Five new example scripts in examples/eval/:
    • 01_basic_evaluation.exs - Simple test execution
    • 02_yaml_suite.exs - Loading and running YAML suites
    • 03_optimization.exs - Parameter optimization workflows
    • 04_custom_evaluator.exs - Implementing custom evaluators
    • 05_ab_testing.exs - A/B testing configurations

[0.8.1] - 2025-12-31

Fixed

  • Fixed Usage struct not implementing Access behaviour for telemetry metrics
  • Fixed Task.shutdown/2 nil return case in AgentServer cancellation
  • Fixed tool call field access for OpenAI-compatible APIs (string vs atom keys)

Added

  • Vision/multimodal test suite with image fixtures (test/nous/vision_test.exs)
  • ContentPart test suite for image conversion utilities (test/nous/content_part_test.exs)
  • Multimodal message examples in conversation demo (examples/04_conversation.exs)

Changed

  • Updated docs to link examples to GitHub source files
  • Improved sidebar grouping in hexdocs

[0.8.0] - 2025-12-31

Added

  • Context Management: New Nous.Agent.Context struct for immutable conversation state, message history, and dependency injection. Supports context continuation between runs:

    {:ok, result1} = Nous.run(agent, "My name is Alice")
    {:ok, result2} = Nous.run(agent, "What's my name?", context: result1.context)
  • Agent Behaviour: New Nous.Agent.Behaviour for implementing custom agents with lifecycle callbacks (init_context/2, build_messages/2, process_response/3, extract_output/2).

  • Dual Callback System: New Nous.Agent.Callbacks supporting both map-based callbacks and process messages:

    # Map callbacks
    Nous.run(agent, "Hello", callbacks: %{
      on_llm_new_delta: fn _event, delta -> IO.write(delta) end
    })
    
    # Process messages (for LiveView)
    Nous.run(agent, "Hello", notify_pid: self())
  • Module-Based Tools: New Nous.Tool.Behaviour for defining tools as modules with metadata/0 and execute/2 callbacks. Use Nous.Tool.from_module/2 to create tools from modules.

  • Tool Context Updates: New Nous.Tool.ContextUpdate struct allowing tools to modify context state:

    def my_tool(ctx, args) do
      {:ok, result, ContextUpdate.new() |> ContextUpdate.set(:key, value)}
    end
  • Tool Testing Helpers: New Nous.Tool.Testing module with mock_tool/2, spy_tool/1, and test_context/1 for testing tool interactions.

  • Tool Validation: New Nous.Tool.Validator for JSON Schema validation of tool arguments.

  • Prompt Templates: New Nous.PromptTemplate for EEx-based prompt templates with variable substitution.

  • Built-in Agent Implementations: Nous.Agents.BasicAgent (default) and Nous.Agents.ReActAgent (reasoning with planning tools).

  • Structured Errors: New Nous.Errors module with MaxIterationsReached, ToolExecutionError, and ExecutionCancelled error types.

  • Enhanced Telemetry: New events for iterations (:iteration), tool timeouts (:tool_timeout), and context updates (:context_update).

Changed

  • Result Structure: Nous.run/3 now returns %{output: _, context: _, usage: _} instead of just output string.

  • Tool Function Signature: Tools now receive (ctx, args) instead of (args). The context provides access to ctx.deps for dependency injection.

  • Examples Modernized: Reduced from ~95 files to 21 files. Flattened directory structure from 4 levels to 2 levels. All examples updated to v0.8.0 API.

Removed

[0.7.2] - 2025-12-29

Fixed

  • Stream completion events: The [DONE] SSE event now properly emits a {:finish, "stop"} event instead of being silently discarded. This ensures stream consumers always receive a completion signal.

  • Documentation links: Fixed broken links in hexdocs documentation. Relative links to .exs example files now use absolute GitHub URLs so they work correctly on hexdocs.pm.

[0.7.1] - 2025-12-29

Changed

  • Make all provider dependencies optional: openai_ex, anthropix, and gemini_ex are now truly optional dependencies. Users only need to install the dependencies for the providers they use.

  • Runtime dependency checks: Provider modules now check for dependency availability at runtime instead of compile-time, allowing the library to compile without any provider-specific dependencies.

  • OpenAI message format: Messages are now returned as plain maps with string keys (%{"role" => "user", "content" => "Hi"}) instead of OpenaiEx.ChatMessage structs. This removes the compile-time dependency on openai_ex for message formatting.

Fixed

  • Fixed "anthropix dependency not available" errors that occurred when using the library in applications without anthropix installed.

  • Fixed compile-time errors that occurred when openai_ex was not present in the consuming application.

[0.7.0] - 2025-12-27

Initial public release with multi-provider LLM support:

  • OpenAI-compatible providers (OpenAI, Groq, OpenRouter, Ollama, LM Studio, vLLM)
  • Native Anthropic Claude support with extended thinking
  • Google Gemini support
  • Mistral AI support
  • Tool/function calling
  • Streaming support
  • ReAct agent implementation