Nous.Plugins.InputGuard (nous v0.13.3)

Modular malicious input classifier plugin.

InputGuard detects prompt injection, jailbreak attempts, and other malicious inputs using a composable strategy pattern. Detection backends, aggregation modes, and policy actions are all configurable.

Architecture

User Input → InputGuard (before_request hook)
               ├─ Strategy 1: Pattern matching
               ├─ Strategy 2: LLM Judge
               ├─ Strategy N: Custom function
               ↓
             Aggregator (any / majority / all)
               ↓
             Policy (block / warn / log / callback)
               ↓
             Modified Context (or halted execution)

Configuration

Store configuration in deps under the :input_guard_config key:

agent = Nous.new("openai:gpt-4",
  plugins: [Nous.Plugins.InputGuard]
)

{:ok, result} = Nous.run(agent, "Hello",
  deps: %{
    input_guard_config: %{
      strategies: [
        {Nous.Plugins.InputGuard.Strategies.Pattern, []},
        {Nous.Plugins.InputGuard.Strategies.LLMJudge, model: "openai:gpt-4o-mini"},
        {MyApp.InputGuard.Blocklist, words: ["hack", "exploit"]}
      ],
      policy: %{suspicious: :warn, blocked: :block},
      aggregation: :any,
      short_circuit: false,
      on_violation: &MyApp.log_violation/1,
      skip_empty: true
    }
  }
)

Configuration Options

:strategies — List of {module, keyword_opts} tuples. Each module must implement Nous.Plugins.InputGuard.Strategy. Default: [{Strategies.Pattern, []}]
:policy — Map of severity to action. Default: %{suspicious: :warn, blocked: :block}
:aggregation — How to combine results from multiple strategies. :any (default) flags if any strategy flags, :majority if more than half flag, :all only if every strategy flags.
:short_circuit — When true, stops running strategies on first :blocked result. Default: false
:on_violation — Optional callback function fn result -> ... end called when input is flagged.
:skip_empty — Skip checking empty or whitespace-only messages. Default: true

Streaming Limitation

InputGuard operates via the before_request plugin hook, which is not invoked during run_stream in AgentRunner. When using streaming, InputGuard will not apply — validate input before calling run_stream if needed.