ADR-0022: Pluggable Storage Strategy
View SourceStatus
Accepted
Context
Currently, the Docker backend uses a hardcoded working directory pattern with a default of /workspace, which is invalid as a host path and causes test failures on macOS/Linux. More broadly, different deployment scenarios require different storage backends:
- Local Development: Ephemeral tmp directories, fast, no external dependencies
- Multi-Node Clusters: Shared storage (S3, Tigris) for session state across nodes
- Fly.io Deployments: Tigris Object Storage for persistent, low-latency storage
- Session Resume: Persistent storage enabling conversation continuation after restarts
- Compliance/Audit: Durable storage of session artifacts for record-keeping
Additionally, the Anthropic backend creates files server-side that need to be synchronized to client storage. Consumers of this library need callbacks to capture file references for their own persistence needs (e.g., associating files with users in a database).
Forces
- Docker requires a local mount path - remote storage must be synced locally
- Anthropic backend creates files remotely that may need client-side persistence
- Native backend operates in-memory but may want to persist results
- Library consumers need hooks to store file metadata in their own systems
- Storage lifecycle must align with session lifecycle
- Cleanup must be reliable, even on crashes
Decision
We will introduce a Conjure.Storage behaviour that abstracts storage operations, with implementations for Local, S3, and Tigris. All backends will use this abstraction for session working directories and file operations.
Storage Behaviour
defmodule Conjure.Storage do
@moduledoc "Behaviour for session storage backends"
@type session_id :: String.t()
@type path :: String.t()
@type content :: binary()
@type state :: term()
@type file_ref :: %{
path: path(),
size: non_neg_integer(),
content_type: String.t() | nil,
checksum: String.t() | nil,
storage_url: String.t() | nil,
created_at: DateTime.t()
}
# Lifecycle
@callback init(session_id, opts :: keyword()) :: {:ok, state} | {:error, term()}
@callback cleanup(state) :: :ok | {:error, term()}
# File operations
@callback write(state, path, content) :: {:ok, file_ref} | {:error, term()}
@callback read(state, path) :: {:ok, content} | {:error, term()}
@callback exists?(state, path) :: boolean()
@callback delete(state, path) :: :ok | {:error, term()}
@callback list(state) :: {:ok, [file_ref]} | {:error, term()}
# For Docker mounting - returns a local filesystem path
@callback local_path(state) :: {:ok, path} | {:error, :not_supported}
# For syncing remote files (Anthropic backend)
@callback sync_from_remote(state, remote_files :: [map()]) :: {:ok, [file_ref]} | {:error, term()}
@callback sync_to_remote(state) :: {:ok, [file_ref]} | {:error, term()}
# Optional callbacks
@optional_callbacks [sync_from_remote: 2, sync_to_remote: 1]
endImplementations
Local Storage (Default)
defmodule Conjure.Storage.Local do
@behaviour Conjure.Storage
@impl true
def init(session_id, opts) do
base = Keyword.get(opts, :base_path, System.tmp_dir!())
prefix = Keyword.get(opts, :prefix, "conjure")
path = Path.join(base, "#{prefix}_#{session_id}")
File.mkdir_p!(path)
{:ok, %{path: path, session_id: session_id}}
end
@impl true
def local_path(%{path: path}), do: {:ok, path}
@impl true
def cleanup(%{path: path}) do
File.rm_rf!(path)
:ok
end
# ... other callbacks operate directly on filesystem
endS3 Storage
defmodule Conjure.Storage.S3 do
@behaviour Conjure.Storage
@impl true
def init(session_id, opts) do
bucket = Keyword.fetch!(opts, :bucket)
prefix = Keyword.get(opts, :prefix, "sessions/")
region = Keyword.get(opts, :region, "us-east-1")
# Local cache for Docker mounting
cache_path = Path.join(System.tmp_dir!(), "conjure_s3_#{session_id}")
File.mkdir_p!(cache_path)
{:ok, %{
bucket: bucket,
prefix: "#{prefix}#{session_id}/",
region: region,
cache_path: cache_path,
session_id: session_id
}}
end
@impl true
def local_path(%{cache_path: path}), do: {:ok, path}
@impl true
def write(state, path, content) do
# Write to local cache
local_file = Path.join(state.cache_path, path)
File.mkdir_p!(Path.dirname(local_file))
File.write!(local_file, content)
# Async upload to S3
s3_key = "#{state.prefix}#{path}"
:ok = upload_to_s3(state.bucket, s3_key, content, state.region)
{:ok, build_file_ref(path, content, s3_url(state, path))}
end
@impl true
def cleanup(state) do
# Delete from S3
delete_prefix(state.bucket, state.prefix, state.region)
# Cleanup local cache
File.rm_rf!(state.cache_path)
:ok
end
# ... other callbacks
endTigris Storage (Fly.io)
Tigris is Fly.io's globally-distributed, S3-compatible object storage. It automatically replicates data to regions close to your Fly Machines, providing low-latency access worldwide.
Fly.io Setup:
# Create a Tigris bucket for your Fly app
fly storage create
# Or with a specific name
fly storage create conjure-sessions
# Credentials are automatically injected into your Fly Machines as:
# - AWS_ACCESS_KEY_ID
# - AWS_SECRET_ACCESS_KEY
# - AWS_ENDPOINT_URL_S3
# - BUCKET_NAME
fly.toml configuration:
[env]
SKILLEX_STORAGE = "tigris"
# Tigris bucket is attached automatically after `fly storage create`defmodule Conjure.Storage.Tigris do
@moduledoc """
Fly.io Tigris object storage backend.
Tigris provides:
- Global distribution with automatic replication
- S3-compatible API
- Zero-config when running on Fly.io Machines
- Low-latency access from any Fly.io region
Credentials are automatically available when running on Fly.io.
For local development, use `fly storage` credentials or set env vars.
"""
@behaviour Conjure.Storage
@impl true
def init(session_id, opts) do
# Fly.io injects these automatically into Machines
bucket = Keyword.get_lazy(opts, :bucket, fn ->
System.get_env("BUCKET_NAME") || raise "BUCKET_NAME not set"
end)
endpoint = Keyword.get(opts, :endpoint, System.get_env("AWS_ENDPOINT_URL_S3"))
access_key = Keyword.get(opts, :access_key, System.get_env("AWS_ACCESS_KEY_ID"))
secret_key = Keyword.get(opts, :secret_key, System.get_env("AWS_SECRET_ACCESS_KEY"))
# Optional: specify region hint for read affinity
# Fly.io sets FLY_REGION automatically
region = Keyword.get(opts, :region, System.get_env("FLY_REGION", "auto"))
# Local cache for Docker mounting (uses Fly volume if available)
cache_base = Keyword.get(opts, :cache_path,
System.get_env("FLY_VOLUME_PATH") || System.tmp_dir!()
)
cache_path = Path.join(cache_base, "conjure_sessions/#{session_id}")
File.mkdir_p!(cache_path)
{:ok, %{
bucket: bucket,
prefix: "sessions/#{session_id}/",
endpoint: endpoint,
access_key: access_key,
secret_key: secret_key,
region: region,
cache_path: cache_path,
session_id: session_id
}}
end
@impl true
def local_path(%{cache_path: path}), do: {:ok, path}
@impl true
def write(state, path, content) do
# Write to local cache first (fast, synchronous)
local_file = Path.join(state.cache_path, path)
File.mkdir_p!(Path.dirname(local_file))
File.write!(local_file, content)
# Upload to Tigris (replicated globally automatically)
s3_key = "#{state.prefix}#{path}"
:ok = put_object(state, s3_key, content)
{:ok, build_file_ref(path, content, public_url(state, path))}
end
@impl true
def cleanup(state) do
# Delete from Tigris
delete_prefix(state, state.prefix)
# Cleanup local cache
File.rm_rf!(state.cache_path)
:ok
end
# Uses ex_aws or req for S3-compatible API calls
defp put_object(state, key, content) do
# Implementation using S3-compatible API against state.endpoint
end
defp delete_prefix(state, prefix) do
# List and delete all objects with prefix
end
defp public_url(state, path) do
"#{state.endpoint}/#{state.bucket}/#{state.prefix}#{path}"
end
endLocal Development with Tigris:
# Get credentials for local dev
fly storage auth
# Or proxy through Fly
fly proxy 4566:443 -a <your-tigris-app>
# config/dev.exs
config :conjure, :storage,
strategy: Conjure.Storage.Tigris,
endpoint: "http://localhost:4566", # Local proxy
bucket: "conjure-dev"Multi-Region Session Affinity:
Tigris automatically serves data from the nearest region. For session locality:
# Include region in session_id for debugging/routing
session_id = "#{System.get_env("FLY_REGION")}_#{UUID.uuid4()}"Consumer Callbacks
To allow library consumers to capture file references for their own persistence:
defmodule Conjure.Storage.Callbacks do
@type file_event :: :created | :modified | :deleted | :synced
@type callback :: (file_event, Conjure.Storage.file_ref(), session_id :: String.t() -> :ok)
@callback on_file_event(callback) :: :ok
endCallbacks are configured per-session:
session = Conjure.Session.new_docker(skills,
storage: {Conjure.Storage.S3, bucket: "my-bucket"},
on_file_created: fn file_ref, session_id ->
# Store reference in your database
MyApp.Repo.insert!(%MyApp.SessionFile{
user_id: current_user.id,
session_id: session_id,
path: file_ref.path,
storage_url: file_ref.storage_url,
size: file_ref.size
})
end,
on_file_deleted: fn file_ref, session_id ->
MyApp.Repo.delete_all(
from f in MyApp.SessionFile,
where: f.session_id == ^session_id and f.path == ^file_ref.path
)
end
)Anthropic Backend Sync
The Anthropic backend creates files server-side. Storage sync pulls these to client storage:
defmodule Conjure.Backend.Anthropic do
def chat(session, message, api_callback, opts) do
# ... conversation logic ...
# After turn completes, sync created files
case session.storage do
nil -> {:ok, response, session}
storage ->
remote_files = get_created_files_from_response(response)
{:ok, file_refs} = Conjure.Storage.sync_from_remote(storage, remote_files)
session = %{session | created_files: session.created_files ++ file_refs}
# Fire callbacks
Enum.each(file_refs, fn ref ->
fire_callback(session, :synced, ref)
end)
{:ok, response, session}
end
end
endSession Integration
defmodule Conjure.Session do
defstruct [
# ... existing fields ...
:storage, # Storage state
:storage_strategy, # Module implementing Storage behaviour
:file_callbacks # Map of event -> callback function
]
def new_docker(skills, opts) do
{storage_strategy, storage_opts} = resolve_storage(opts)
session_id = generate_session_id()
{:ok, storage} = storage_strategy.init(session_id, storage_opts)
# Get local path for Docker mounting
{:ok, work_dir} = storage_strategy.local_path(storage)
%Session{
execution_mode: :docker,
storage: storage,
storage_strategy: storage_strategy,
context: %ExecutionContext{working_directory: work_dir},
# ...
}
end
defp resolve_storage(opts) do
case Keyword.get(opts, :storage) do
nil -> {Conjure.Storage.Local, []}
{module, opts} -> {module, opts}
module when is_atom(module) -> {module, []}
end
end
endConfiguration
# config/dev.exs - Local development
config :conjure, :storage,
strategy: Conjure.Storage.Local,
base_path: System.tmp_dir!(),
prefix: "conjure"
# config/prod.exs - Fly.io with Tigris (zero-config)
# Credentials auto-injected by Fly.io Machines
config :conjure, :storage,
strategy: Conjure.Storage.Tigris
# bucket, endpoint, credentials all from env vars automatically
# config/prod.exs - AWS S3
config :conjure, :storage,
strategy: Conjure.Storage.S3,
bucket: System.get_env("S3_BUCKET"),
region: System.get_env("AWS_REGION", "us-east-1")
# Per-session override always takes precedence
session = Conjure.Session.new_docker(skills,
storage: {Conjure.Storage.S3, bucket: "custom-bucket"}
)Consequences
Positive
- Fixes Docker Bug: Default storage now creates valid temp directories
- Multi-Node Support: S3/Tigris enable shared state across cluster nodes
- Fly.io Ready: First-class Tigris support for Fly.io deployments
- Session Persistence: Storage backends can persist beyond session lifetime
- Consumer Integration: Callbacks enable tight integration with consumer applications
- Anthropic File Sync: Remote files can be captured and persisted client-side
- Unified Interface: All backends use the same storage abstraction
- Testable: Easy to mock storage in tests
Negative
- Complexity: Adds new abstraction layer and 3+ modules
- S3 Dependency: S3/Tigris implementations need HTTP client (optional dep)
- Sync Latency: Remote storage adds latency vs local filesystem
- Local Cache Management: S3/Tigris need local cache for Docker, adding cleanup complexity
Neutral
- Backwards Compatible: Default Local storage maintains current behavior (once fixed)
- Optional Feature: Consumers only pay complexity cost if using cloud storage
File Structure
lib/conjure/
├── storage.ex # Storage behaviour
├── storage/
│ ├── local.ex # Local filesystem (default)
│ ├── s3.ex # AWS S3
│ └── tigris.ex # Fly.io Tigris
└── session.ex # Updated with storage integrationAlternatives Considered
1. Docker Volumes Instead of Mounts
Could use Docker volumes for persistence, but:
- Less portable across storage backends
- Harder to access files from host
- Doesn't solve multi-node problem
2. Database Storage (Ecto)
Store files directly in PostgreSQL/etc:
- Adds hard Ecto dependency
- Not suitable for large files
- Tighter coupling than callbacks
3. GenStage/Broadway for File Events
Use GenStage for file event streaming:
- Overkill for most use cases
- Adds significant complexity
- Simple callbacks sufficient for MVP
References
- Tigris Object Storage
- Fly.io Tigris Integration
- AWS S3 API Reference
- ADR-0010: Docker Production Executor
- ADR-0020: Backend Behaviour Architecture