This guide covers how to manage conversation history size with memory strategies and how to enforce token or cost budgets with guardrail policies.
Memory Strategies
AI providers impose context window limits. When a conversation grows beyond those limits you need to trim messages before sending them. PhoenixAI.Store provides three built-in strategies.
SlidingWindow
PhoenixAI.Store.Memory.Strategies.SlidingWindow keeps the last N messages and
discards older ones.
When to use: Most conversations. Fast, zero dependencies, predictable cost.
Options:
:last— number of most recent messages to retain (default:50)
Priority: 100 (runs after higher-priority strategies).
TokenTruncation
PhoenixAI.Store.Memory.Strategies.TokenTruncation removes oldest messages until the
total token count fits within a budget.
When to use: When you need hard token limits rather than message count limits, or when messages vary wildly in length.
Options:
:max_tokens— maximum token budget (required)
Token counting uses token_count from the message struct when available. Falls back to
a heuristic counter (chars / 4) from PhoenixAI.Store.Memory.TokenCounter.Default.
Priority: 200.
Summarization
PhoenixAI.Store.Memory.Strategies.Summarization condenses older messages into an
AI-generated summary and injects it as a pinned system message.
When to use: Long-running conversations where you want to preserve context from older messages rather than dropping them entirely.
Options:
:threshold— minimum message count before summarization kicks in (default:20):provider— AI provider override (falls back to pipeline context):model— model override (falls back to pipeline context):summarize_fn—fun(messages, context, opts) :: {:ok, binary} | {:error, term}override for testing without real AI calls
Priority: 300 (runs before SlidingWindow and TokenTruncation so the summary is in
place when the window strategy runs).
Pipeline Composition
PhoenixAI.Store.Memory.Pipeline chains strategies together. Strategies are sorted by
priority before execution — lower priority number runs first.
The pipeline always:
- Extracts pinned messages (role
:systemorpinned: true) before running strategies - Runs strategies on the remaining messages in priority order
- Re-injects pinned messages at the beginning of the result
Pipeline.new/1
Create a pipeline from a list of {strategy_module, opts} tuples:
alias PhoenixAI.Store.Memory.Pipeline
alias PhoenixAI.Store.Memory.Strategies.{SlidingWindow, TokenTruncation, Summarization}
# Keep last 30 messages
pipeline = Pipeline.new([{SlidingWindow, [last: 30]}])
# Trim to 8000 tokens
pipeline = Pipeline.new([{TokenTruncation, [max_tokens: 8_000]}])
# Summarize long history, then keep last 20 messages
pipeline = Pipeline.new([
{Summarization, [threshold: 40, provider: :openai, model: "gpt-4o-mini"]},
{SlidingWindow, [last: 20]}
])Presets
Pipeline.preset/1 returns common configurations:
Pipeline.preset(:default) # SlidingWindow last: 50
Pipeline.preset(:aggressive) # TokenTruncation max_tokens: 4096
Pipeline.preset(:summarize) # Summarization threshold: 20, then SlidingWindow last: 20Using Memory with converse/3
Pass a pipeline via the :memory_pipeline option to Store.converse/3. The pipeline
runs between loading messages and calling the AI provider:
alias PhoenixAI.Store
alias PhoenixAI.Store.Memory.Pipeline
alias PhoenixAI.Store.Memory.Strategies.SlidingWindow
pipeline = Pipeline.new([{SlidingWindow, [last: 20]}])
{:ok, response} =
Store.converse(conv.id, "What did we discuss earlier?",
store: :my_store,
memory_pipeline: pipeline
)Applying Memory Manually
Call Store.apply_memory/3 to get the filtered message list without starting a
converse/3 turn — useful when you want to call AI.chat/2 directly:
pipeline = Pipeline.preset(:default)
{:ok, messages} = Store.apply_memory(conv.id, pipeline, store: :my_store)
# messages is a list of %PhoenixAI.Message{} ready for AI.chat/2
{:ok, response} = AI.chat(messages, provider: :openai, model: "gpt-4o")Options accepted by apply_memory/3:
:store— store instance name:model/:provider— passed to strategy context:max_tokens— token budget override:token_counter— token counter module override
Long-Term Memory Injection
To automatically inject stored facts and user profiles into the message list before
the pipeline runs, pass :inject_long_term_memory and :user_id:
{:ok, messages} =
Store.apply_memory(conv.id, pipeline,
store: :my_store,
inject_long_term_memory: true,
user_id: "user-123"
)Guardrails
Guardrail policies run before the AI provider call in converse/3. They receive a
%PhoenixAI.Guardrails.Request{} and either approve it or halt with a
%PhoenixAI.Guardrails.PolicyViolation{}.
TokenBudget
PhoenixAI.Store.Guardrails.TokenBudget reads accumulated token usage from the store
adapter and rejects requests that would exceed the budget.
Requires: The adapter must implement PhoenixAI.Store.Adapter.TokenUsage
(sum_conversation_tokens/2 and sum_user_tokens/2). Both built-in adapters
implement this.
Options:
:max(required) — maximum allowed token count:scope—:conversation(default),:user, or:time_window:mode—:accumulated(default, count only stored tokens) or:estimated(add estimated tokens for the current request):window_ms— required for:time_windowscope; window duration in milliseconds:key_prefix— key prefix for rate limiter (:time_windowscope):token_counter— token counter module
Scope behavior:
:conversation— sums tokens across all messages in the conversation:user— sums tokens across all conversations for the user:time_window— uses Hammer for sliding window rate limiting (requires{:hammer, "~> 7.3"}in your deps)
CostBudget
PhoenixAI.Store.Guardrails.CostBudget reads accumulated cost from the store adapter
and rejects requests that would exceed the budget.
Requires: The adapter must implement PhoenixAI.Store.Adapter.CostStore
(sum_cost/2).
Options:
:max(required) — maximum allowed cost as aDecimalor string (e.g."5.00"):scope—:conversation(default) or:user
Using Guardrails with converse/3
Pass a list of policy entries via the :guardrails option:
alias PhoenixAI.Store
alias PhoenixAI.Store.Guardrails.{TokenBudget, CostBudget}
{:ok, response} =
Store.converse(conv.id, "Continue our analysis",
store: :my_store,
user_id: "user-123",
guardrails: [
{TokenBudget, [max: 100_000, scope: :conversation]},
{CostBudget, [max: "5.00", scope: :user]}
]
)When a policy rejects the request, converse/3 returns {:error, %PolicyViolation{}}:
case Store.converse(conv.id, message, store: :my_store, guardrails: [...]) do
{:ok, response} ->
response.content
{:error, %PhoenixAI.Guardrails.PolicyViolation{} = violation} ->
"Blocked: #{violation.reason}"
endChecking Guardrails Manually
Use Store.check_guardrails/3 to run policies without starting a converse/3 turn.
The store injects the adapter into request.assigns automatically:
alias PhoenixAI.Guardrails.Request
request = %Request{
messages: messages,
conversation_id: conv.id,
user_id: "user-123"
}
policies = [
{TokenBudget, [max: 50_000, scope: :conversation]},
{CostBudget, [max: "2.50", scope: :user]}
]
case Store.check_guardrails(request, policies, store: :my_store) do
{:ok, _request} ->
AI.chat(messages, provider: :openai, model: "gpt-4o")
{:error, violation} ->
{:error, violation.reason}
endLong-Term Memory
Long-term memory (LTM) stores per-user facts and profile summaries that persist across conversations.
Requires: The adapter must implement FactStore and optionally ProfileStore.
Storing Facts
alias PhoenixAI.Store
alias PhoenixAI.Store.LongTermMemory.Fact
fact = %Fact{user_id: "user-123", key: "preferred_language", value: "Elixir"}
{:ok, saved_fact} = Store.save_fact(fact, store: :my_store)save_fact/2 uses upsert semantics — writing to the same {user_id, key} overwrites
the previous value.
Retrieve and delete facts:
{:ok, facts} = Store.get_facts("user-123", store: :my_store)
:ok = Store.delete_fact("user-123", "preferred_language", store: :my_store)Automatic Fact Extraction
Configure automatic extraction in converse/3 via the :extract_facts option or the
store-level converse: [extract_facts: true] default:
{:ok, response} =
Store.converse(conv.id, "My name is Alice and I prefer dark mode",
store: :my_store,
user_id: "user-123",
extract_facts: true
)Trigger extraction manually:
{:ok, facts} = Store.extract_facts(conv.id, store: :my_store, user_id: "user-123")
# Run in background (returns immediately)
{:ok, :async} =
Store.extract_facts(conv.id,
store: :my_store,
extraction_mode: :async
)User Profiles
Profiles are AI-generated summaries built from a user's stored facts:
# Generate and save a profile from stored facts
{:ok, profile} =
Store.update_profile("user-123",
store: :my_store,
provider: :openai,
model: "gpt-4o-mini"
)
# Load an existing profile
{:ok, profile} = Store.get_profile("user-123", store: :my_store)
IO.puts(profile.summary)LTM Configuration
Configure LTM behavior at the store level:
{PhoenixAI.Store,
name: :my_store,
adapter: PhoenixAI.Store.Adapters.ETS,
long_term_memory: [
enabled: true,
max_facts_per_user: 200,
extraction_trigger: :per_turn, # :manual | :per_turn | :on_close
extraction_mode: :async, # :sync | :async
extraction_provider: :openai,
extraction_model: "gpt-4o-mini",
inject_long_term_memory: true # auto-inject into apply_memory/3
]}See Also
- Adapters — choose a backend for memory and fact storage
- Telemetry & Events — observe memory trim and guardrail violations
- Getting Started — initial setup