Memory & Guardrails

This guide covers how to manage conversation history size with memory strategies and how to enforce token or cost budgets with guardrail policies.

Memory Strategies

AI providers impose context window limits. When a conversation grows beyond those limits you need to trim messages before sending them. PhoenixAI.Store provides three built-in strategies.

SlidingWindow

PhoenixAI.Store.Memory.Strategies.SlidingWindow keeps the last N messages and discards older ones.

When to use: Most conversations. Fast, zero dependencies, predictable cost.

Options:

:last — number of most recent messages to retain (default: 50)

Priority: 100 (runs after higher-priority strategies).

TokenTruncation

PhoenixAI.Store.Memory.Strategies.TokenTruncation removes oldest messages until the total token count fits within a budget.

When to use: When you need hard token limits rather than message count limits, or when messages vary wildly in length.

Options:

:max_tokens — maximum token budget (required)

Token counting uses token_count from the message struct when available. Falls back to a heuristic counter (chars / 4) from PhoenixAI.Store.Memory.TokenCounter.Default.

Priority: 200.

Summarization

PhoenixAI.Store.Memory.Strategies.Summarization condenses older messages into an AI-generated summary and injects it as a pinned system message.

When to use: Long-running conversations where you want to preserve context from older messages rather than dropping them entirely.

Options:

:threshold — minimum message count before summarization kicks in (default: 20)
:provider — AI provider override (falls back to pipeline context)
:model — model override (falls back to pipeline context)
:summarize_fn — fun(messages, context, opts) :: {:ok, binary} | {:error, term} override for testing without real AI calls

Priority: 300 (runs before SlidingWindow and TokenTruncation so the summary is in place when the window strategy runs).

Pipeline Composition

PhoenixAI.Store.Memory.Pipeline chains strategies together. Strategies are sorted by priority before execution — lower priority number runs first.

The pipeline always:

Extracts pinned messages (role :system or pinned: true) before running strategies
Runs strategies on the remaining messages in priority order
Re-injects pinned messages at the beginning of the result

`Pipeline.new/1`

Create a pipeline from a list of {strategy_module, opts} tuples:

alias PhoenixAI.Store.Memory.Pipeline
alias PhoenixAI.Store.Memory.Strategies.{SlidingWindow, TokenTruncation, Summarization}

# Keep last 30 messages
pipeline = Pipeline.new([{SlidingWindow, [last: 30]}])

# Trim to 8000 tokens
pipeline = Pipeline.new([{TokenTruncation, [max_tokens: 8_000]}])

# Summarize long history, then keep last 20 messages
pipeline = Pipeline.new([
  {Summarization, [threshold: 40, provider: :openai, model: "gpt-4o-mini"]},
  {SlidingWindow, [last: 20]}
])

Presets

Pipeline.preset/1 returns common configurations:

Pipeline.preset(:default)    # SlidingWindow last: 50
Pipeline.preset(:aggressive) # TokenTruncation max_tokens: 4096
Pipeline.preset(:summarize)  # Summarization threshold: 20, then SlidingWindow last: 20

Using Memory with `converse/3`

Pass a pipeline via the :memory_pipeline option to Store.converse/3. The pipeline runs between loading messages and calling the AI provider:

alias PhoenixAI.Store
alias PhoenixAI.Store.Memory.Pipeline
alias PhoenixAI.Store.Memory.Strategies.SlidingWindow

pipeline = Pipeline.new([{SlidingWindow, [last: 20]}])

{:ok, response} =
  Store.converse(conv.id, "What did we discuss earlier?",
    store: :my_store,
    memory_pipeline: pipeline
  )

Applying Memory Manually

Call Store.apply_memory/3 to get the filtered message list without starting a converse/3 turn — useful when you want to call AI.chat/2 directly:

pipeline = Pipeline.preset(:default)

{:ok, messages} = Store.apply_memory(conv.id, pipeline, store: :my_store)
# messages is a list of %PhoenixAI.Message{} ready for AI.chat/2

{:ok, response} = AI.chat(messages, provider: :openai, model: "gpt-4o")

Options accepted by apply_memory/3:

:store — store instance name
:model / :provider — passed to strategy context
:max_tokens — token budget override
:token_counter — token counter module override

Long-Term Memory Injection

To automatically inject stored facts and user profiles into the message list before the pipeline runs, pass :inject_long_term_memory and :user_id:

{:ok, messages} =
  Store.apply_memory(conv.id, pipeline,
    store: :my_store,
    inject_long_term_memory: true,
    user_id: "user-123"
  )

Guardrails

Guardrail policies run before the AI provider call in converse/3. They receive a %PhoenixAI.Guardrails.Request{} and either approve it or halt with a %PhoenixAI.Guardrails.PolicyViolation{}.

TokenBudget

PhoenixAI.Store.Guardrails.TokenBudget reads accumulated token usage from the store adapter and rejects requests that would exceed the budget.

Requires: The adapter must implement PhoenixAI.Store.Adapter.TokenUsage (sum_conversation_tokens/2 and sum_user_tokens/2). Both built-in adapters implement this.

Options:

:max (required) — maximum allowed token count
:scope — :conversation (default), :user, or :time_window
:mode — :accumulated (default, count only stored tokens) or :estimated (add estimated tokens for the current request)
:window_ms — required for :time_window scope; window duration in milliseconds
:key_prefix — key prefix for rate limiter (:time_window scope)
:token_counter — token counter module

Scope behavior:

:conversation — sums tokens across all messages in the conversation
:user — sums tokens across all conversations for the user
:time_window — uses Hammer for sliding window rate limiting (requires {:hammer, "~> 7.3"} in your deps)

CostBudget

PhoenixAI.Store.Guardrails.CostBudget reads accumulated cost from the store adapter and rejects requests that would exceed the budget.

Requires: The adapter must implement PhoenixAI.Store.Adapter.CostStore (sum_cost/2).

Options:

:max (required) — maximum allowed cost as a Decimal or string (e.g. "5.00")
:scope — :conversation (default) or :user

Using Guardrails with `converse/3`

Pass a list of policy entries via the :guardrails option:

alias PhoenixAI.Store
alias PhoenixAI.Store.Guardrails.{TokenBudget, CostBudget}

{:ok, response} =
  Store.converse(conv.id, "Continue our analysis",
    store: :my_store,
    user_id: "user-123",
    guardrails: [
      {TokenBudget, [max: 100_000, scope: :conversation]},
      {CostBudget, [max: "5.00", scope: :user]}
    ]
  )

When a policy rejects the request, converse/3 returns {:error, %PolicyViolation{}}:

case Store.converse(conv.id, message, store: :my_store, guardrails: [...]) do
  {:ok, response} ->
    response.content

  {:error, %PhoenixAI.Guardrails.PolicyViolation{} = violation} ->
    "Blocked: #{violation.reason}"
end

Checking Guardrails Manually

Use Store.check_guardrails/3 to run policies without starting a converse/3 turn. The store injects the adapter into request.assigns automatically:

alias PhoenixAI.Guardrails.Request

request = %Request{
  messages: messages,
  conversation_id: conv.id,
  user_id: "user-123"
}

policies = [
  {TokenBudget, [max: 50_000, scope: :conversation]},
  {CostBudget, [max: "2.50", scope: :user]}
]

case Store.check_guardrails(request, policies, store: :my_store) do
  {:ok, _request} ->
    AI.chat(messages, provider: :openai, model: "gpt-4o")

  {:error, violation} ->
    {:error, violation.reason}
end

Long-Term Memory

Long-term memory (LTM) stores per-user facts and profile summaries that persist across conversations.

Requires: The adapter must implement FactStore and optionally ProfileStore.

Storing Facts

alias PhoenixAI.Store
alias PhoenixAI.Store.LongTermMemory.Fact

fact = %Fact{user_id: "user-123", key: "preferred_language", value: "Elixir"}
{:ok, saved_fact} = Store.save_fact(fact, store: :my_store)

save_fact/2 uses upsert semantics — writing to the same {user_id, key} overwrites the previous value.

Retrieve and delete facts:

{:ok, facts} = Store.get_facts("user-123", store: :my_store)

:ok = Store.delete_fact("user-123", "preferred_language", store: :my_store)

Automatic Fact Extraction

Configure automatic extraction in converse/3 via the :extract_facts option or the store-level converse: [extract_facts: true] default:

{:ok, response} =
  Store.converse(conv.id, "My name is Alice and I prefer dark mode",
    store: :my_store,
    user_id: "user-123",
    extract_facts: true
  )

Trigger extraction manually:

{:ok, facts} = Store.extract_facts(conv.id, store: :my_store, user_id: "user-123")

# Run in background (returns immediately)
{:ok, :async} =
  Store.extract_facts(conv.id,
    store: :my_store,
    extraction_mode: :async
  )

User Profiles

Profiles are AI-generated summaries built from a user's stored facts:

# Generate and save a profile from stored facts
{:ok, profile} =
  Store.update_profile("user-123",
    store: :my_store,
    provider: :openai,
    model: "gpt-4o-mini"
  )

# Load an existing profile
{:ok, profile} = Store.get_profile("user-123", store: :my_store)
IO.puts(profile.summary)

LTM Configuration

Configure LTM behavior at the store level:

{PhoenixAI.Store,
 name: :my_store,
 adapter: PhoenixAI.Store.Adapters.ETS,
 long_term_memory: [
   enabled: true,
   max_facts_per_user: 200,
   extraction_trigger: :per_turn,   # :manual | :per_turn | :on_close
   extraction_mode: :async,          # :sync | :async
   extraction_provider: :openai,
   extraction_model: "gpt-4o-mini",
   inject_long_term_memory: true     # auto-inject into apply_memory/3
 ]}

Memory & Guardrails

Memory Strategies

SlidingWindow

TokenTruncation

Summarization

Pipeline Composition

Pipeline.new/1

Presets

Using Memory with converse/3

Applying Memory Manually

Long-Term Memory Injection

Guardrails

TokenBudget

CostBudget

Using Guardrails with converse/3

Checking Guardrails Manually

Long-Term Memory

Storing Facts

Automatic Fact Extraction

User Profiles

LTM Configuration

See Also

`Pipeline.new/1`

Using Memory with `converse/3`

Using Guardrails with `converse/3`