# `Jido.Observe`
[🔗](https://github.com/agentjido/jido/blob/v2.3.0/lib/jido/observe.ex#L1)

Unified observability facade for Jido agents.

Wraps `:telemetry` and tracer callbacks with a simple API for observing agent
execution, action invocations, and workflow iterations.

## Features

- Automatic telemetry event emission (start/stop/exception)
- Duration measurement for all spans (nanoseconds)
- Automatic correlation ID enrichment from `Jido.Tracing.Context`
- Pluggable tracer callbacks via `Jido.Observe.Tracer`
- Threshold-based logging compatibility via `Jido.Observe.Log`

## Correlation Tracing Integration

When `Jido.Tracing.Context` has an active trace context (set via signal processing),
all spans automatically include correlation metadata:

- `:jido_trace_id` - shared trace identifier across the call chain
- `:jido_span_id` - unique span identifier for the current signal
- `:jido_parent_span_id` - parent span that triggered this signal
- `:jido_causation_id` - signal ID that caused this signal

This connects timed telemetry spans with signal causation tracking automatically.

## Configuration

    config :jido, :observability,
      log_level: :info,
      tracer: Jido.Observe.NoopTracer,
      tracer_failure_mode: :warn

`:tracer_failure_mode` controls tracer callback errors:

- `:warn` (default) isolates tracer failures and logs warnings
- `:strict` raises immediately on tracer callback failures

## Usage

### Synchronous work

    Jido.Observe.with_span([:jido, :agent, :action, :run], %{agent_id: id, action: "my_action"}, fn ->
      # Your code here
      {:ok, result}
    end)

If the configured tracer implements optional `with_span_scope/3`, `with_span/3`
uses that callback for sync span scoping. Adapter contract for `with_span_scope/3`:

- Call the provided function in the caller process
- Call the provided function exactly once
- Preserve the function return value
- Preserve exception/throw/exit semantics

### Asynchronous work (Tasks)

    span_ctx = Jido.Observe.start_span([:jido, :agent, :async, :request], %{agent_id: id})

    Task.start(fn ->
      try do
        result = do_async_work()
        Jido.Observe.finish_span(span_ctx, %{result_size: byte_size(result)})
        result
      rescue
        e ->
          Jido.Observe.finish_span_error(span_ctx, :error, e, __STACKTRACE__)
          reraise e, __STACKTRACE__
      end
    end)

Async lifecycle spans remain explicit and context-neutral by default. Process-local
tracing context is not implicitly attached across process boundaries.

## Telemetry Events

All spans emit standard telemetry events:

- `event_prefix ++ [:start]` - emitted when span starts
- `event_prefix ++ [:stop]` - emitted on successful completion
- `event_prefix ++ [:exception]` - emitted on error

Measurements include:
- `:system_time` - start timestamp (nanoseconds)
- `:duration` - elapsed time (nanoseconds, on stop/exception)
- Any additional measurements passed to `finish_span/2`

## Metadata Best Practices

Metadata should be small, identifying data (IDs, step numbers, model names), not full
prompts/responses. For large payloads, include derived measurements (`prompt_tokens`,
`prompt_size_bytes`) rather than the raw content.

Exception telemetry uses bounded, public error metadata. Raw exceptions and
stacktraces are passed to tracer callbacks, but telemetry metadata exposes
low-cardinality fields such as `:error_type` and `:retryable?`.

# `event_prefix`

```elixir
@type event_prefix() :: [atom()]
```

# `measurements`

```elixir
@type measurements() :: map()
```

# `metadata`

```elixir
@type metadata() :: map()
```

# `span_ctx`

```elixir
@type span_ctx() :: Jido.Observe.SpanCtx.t()
```

# `tracer_failure_mode`

```elixir
@type tracer_failure_mode() :: :warn | :strict
```

# `debug_enabled?`

```elixir
@spec debug_enabled?() :: boolean()
```

Checks if debug events are enabled in configuration.

## Returns

`true` if `:debug_events` is `:all` or `:minimal`, `false` otherwise.

# `emit_debug_event`

```elixir
@spec emit_debug_event(event_prefix(), measurements(), metadata()) :: :ok
```

Emits a debug event only if debug events are enabled in config.

This helper checks the `:debug_events` config before emitting, ensuring
zero overhead when debugging is disabled (production default).

## Configuration

    # config/dev.exs
    config :jido, :observability,
      debug_events: :all  # or :minimal, :off

    # config/prod.exs
    config :jido, :observability,
      debug_events: :off

## Parameters

- `event_prefix` - Telemetry event name
- `measurements` - Map of measurements (durations, counts, etc.)
- `metadata` - Map of metadata (agent_id, iteration, etc.)

## Example

    Jido.Observe.emit_debug_event(
      [:jido, :agent, :iteration, :stop],
      %{duration: 1_234_567},
      %{agent_id: agent.id, iteration: 3, status: :awaiting_tool}
    )

# `emit_event`

```elixir
@spec emit_event(event_prefix(), measurements(), metadata()) :: :ok
```

Emits a telemetry event unconditionally.

Unlike `emit_debug_event/3`, this helper does not check debug configuration.
It is intended for domain-level events that should always be emitted.

Trace correlation metadata from `Jido.Tracing.Context` is merged in automatically
when present.

## Parameters

- `event_prefix` - Telemetry event name
- `measurements` - Map of measurements (durations, counts, etc.)
- `metadata` - Map of metadata (agent_id, iteration, etc.)

## Example

    Jido.Observe.emit_event(
      [:jido, :agent, :workflow, :step],
      %{step_duration_ns: 1_234_567},
      %{agent_id: agent.id, step: "plan"}
    )

# `exception_metadata`

```elixir
@spec exception_metadata(atom(), term()) :: metadata()
```

Builds bounded exception metadata for telemetry events.

The returned metadata preserves the historical `:kind` and `:error` keys,
but `:error` is a public `Jido.Error.to_map/1` payload instead of a raw
exception term. Top-level fields stay low-cardinality for metrics and
alerting; richer bounded details remain under `:error`.

Stacktraces are intentionally excluded from telemetry metadata.

# `finish_span`

```elixir
@spec finish_span(span_ctx(), measurements()) :: :ok
```

Finishes a span successfully.

## Parameters

- `span_ctx` - The span context returned by `start_span/2`
- `extra_measurements` - Additional measurements to include (e.g., token counts)

## Example

    Jido.Observe.finish_span(span_ctx, %{prompt_tokens: 100, completion_tokens: 50})

# `finish_span_error`

```elixir
@spec finish_span_error(span_ctx(), atom(), term(), list()) :: :ok
```

Finishes a span with an error.

## Parameters

- `span_ctx` - The span context returned by `start_span/2`
- `kind` - The error kind (`:error`, `:exit`, `:throw`)
- `reason` - The error reason/exception
- `stacktrace` - The stacktrace

## Example

    rescue
      e ->
        Jido.Observe.finish_span_error(span_ctx, :error, e, __STACKTRACE__)
        reraise e, __STACKTRACE__

# `log`

```elixir
@spec log(Logger.level(), Logger.message(), keyword()) :: :ok
```

Conditionally logs a message based on the observability threshold.

Delegates to `Jido.Observe.Log.log/3`.

# `redact`

```elixir
@spec redact(
  term(),
  keyword()
) :: term()
```

Redacts sensitive data based on configuration.

When `:redact_sensitive` is true (production default), replaces the value
with `"[REDACTED]"`. Otherwise returns the value unchanged.

## Configuration

    # config/prod.exs
    config :jido, :observability,
      redact_sensitive: true

    # config/dev.exs
    config :jido, :observability,
      redact_sensitive: false

## Parameters

- `value` - The value to potentially redact
- `opts` - Optional keyword list with `:force_redact` override

## Examples

    # In production (redact_sensitive: true)
    redact("secret data")
    # => "[REDACTED]"

    # In development (redact_sensitive: false)
    redact("secret data")
    # => "secret data"

    # Force redaction regardless of config
    redact("secret data", force_redact: true)
    # => "[REDACTED]"

# `start_span`

```elixir
@spec start_span(event_prefix(), metadata()) :: span_ctx()
```

Starts an async span for work that will complete later.

Use this for Task-based operations where you can't use `with_span/3`.
You must call `finish_span/2` or `finish_span_error/4` when the work completes.

## Parameters

- `event_prefix` - List of atoms for the telemetry event name
- `metadata` - Map of metadata to include in all events

## Returns

A span context struct to pass to `finish_span/2` or `finish_span_error/4`.

## Example

    span_ctx = Jido.Observe.start_span([:jido, :ai, :llm, :request], %{model: "claude"})

    Task.start(fn ->
      result = do_work()
      Jido.Observe.finish_span(span_ctx, %{output_bytes: byte_size(result)})
    end)

# `with_span`

```elixir
@spec with_span(event_prefix(), metadata(), (-&gt; result)) :: result when result: term()
```

Wraps synchronous work with telemetry span events.

Emits `:start` event before executing the function, then either `:stop` on
success or `:exception` if an error is raised. Duration is automatically measured.

## Parameters

- `event_prefix` - List of atoms for the telemetry event name (e.g., `[:jido, :ai, :react, :step]`)
- `metadata` - Map of metadata to include in all events
- `fun` - Zero-arity function to execute

## Returns

The return value of `fun`.

## Example

    Jido.Observe.with_span([:jido, :ai, :tool, :invoke], %{tool: "search"}, fn ->
      perform_search(query)
    end)

