Extraction Profiles

Mnemosyne's extraction pipeline is domain-agnostic by default -- it extracts the same kinds of facts, procedures, and rewards regardless of context. Extraction profiles let you steer the pipeline toward domain-specific knowledge without changing output schemas or pipeline structure.

How Profiles Work

An ExtractionProfile injects overlay text into prompt system messages at extraction time. The LLM sees domain-specific guidance after the base instructions, focusing its output on what matters for your use case.

Profiles affect two things:

Prompt overlays -- per-step text appended to system messages that guide what the LLM extracts
Value function overrides -- per-node-type parameter tweaks that shift what gets surfaced during recall

Built-in Profiles

Mnemosyne ships three profiles as factory functions:

Coding

Tuned for software engineering: debugging, architecture, code patterns.

profile = Mnemosyne.ExtractionProfile.coding()

Semantic extraction: Prioritizes API behaviors, error patterns, architectural constraints, dependency relationships. Weights empirically verified facts higher.
Procedural extraction: Focuses on debugging steps, resolution patterns, build/deploy procedures. Captures language/framework context in conditions.
Reward scoring: Weights concrete outcomes (tests pass, bug resolved) over discussion.
Retrieval: Slightly elevates procedural nodes (base_floor: 0.15).

Research

Tuned for knowledge work: analysis, literature review, information synthesis.

profile = Mnemosyne.ExtractionProfile.research()

Semantic extraction: Prioritizes factual claims with evidence, source relationships, contradictions. Distinguishes empirical findings from speculation.
Procedural extraction: Extracts only high-level methodologies and strategies, not granular steps.
Reward scoring: Weights information novelty and accuracy over task completion.
Retrieval: Elevates semantic nodes (base_floor: 0.15), deprioritizes procedural (base_floor: 0.05).

Customer Support

Tuned for issue resolution: product knowledge, diagnostics, escalation.

profile = Mnemosyne.ExtractionProfile.customer_support()

Semantic extraction: Prioritizes product behaviors, known issues, policy rules, customer-reported symptoms.
Procedural extraction: Focuses on resolution workflows, diagnostic trees, escalation criteria. Captures product version and plan tier in conditions.
Reward scoring: Weights issue resolution and correct escalation paths.
Retrieval: Slightly elevates procedural nodes (base_floor: 0.12).

Using a Profile

At Repo Level

Set the profile in your config. All sessions in the repo use it:

config = %Mnemosyne.Config{
  llm: %{model: "gpt-4o-mini", opts: %{}},
  embedding: %{model: "text-embedding-3-small", opts: %{}},
  extraction_profile: Mnemosyne.ExtractionProfile.coding()
}

{Mnemosyne.Supervisor, config: config, llm: llm_adapter, embedding: embedding_adapter}

At Session Level

Override the repo-level profile for a specific session by passing a modified config:

research_config = %{config | extraction_profile: Mnemosyne.ExtractionProfile.research()}

{:ok, session_id} = Mnemosyne.start_session("Investigate caching strategies",
  repo: "my-project",
  config: research_config)

Custom Profiles

Build your own profile for any domain:

profile = %Mnemosyne.ExtractionProfile{
  name: :legal,
  domain_context: "Legal document analysis and contract review.",
  overlays: %{
    get_semantic: """
    Domain: Legal Analysis.
    Prioritize extracting: contractual obligations, defined terms, liability clauses,
    compliance requirements, jurisdictional constraints. When assessing confidence,
    weight verbatim clause text higher than paraphrased summaries.\
    """,
    get_procedural: """
    Domain: Legal Analysis.
    Focus on: review checklists, compliance verification steps, escalation criteria
    for flagged clauses. In the condition field, capture contract type, jurisdiction,
    and regulatory framework.\
    """,
    get_reward: """
    Domain: Legal Analysis.
    Weight reward toward completeness: Were all relevant clauses identified?
    Were risks properly flagged? Were ambiguities noted?\
    """
  },
  value_function_overrides: %{
    semantic: %{base_floor: 0.2}
  }
}

Overlay Keys

Overlays are keyed by pipeline step. Only steps with an explicit overlay receive injected text -- there is no automatic fallback.

Pipeline Stage	Available Keys
Episode	`:get_state`, `:get_subgoal`, `:get_reward`
Structuring	`:get_semantic`, `:get_procedural`, `:get_return`
Retrieval	`:get_mode`, `:get_plan`, `:get_refined_query`
Reasoning	`:reason_episodic`, `:reason_semantic`, `:reason_procedural`
Intent Merger	`:merge_intent`

Most profiles only need overlays for :get_semantic, :get_procedural, and :get_reward -- the steps where domain context has the most impact on extraction quality.

Value Function Overrides

Profiles can shift retrieval emphasis by overriding value function parameters per node type:

value_function_overrides: %{
  semantic: %{base_floor: 0.2, top_k: 30},
  procedural: %{base_floor: 0.1, top_k: 5}
}

These merge on top of the base config parameters. See the value function parameter reference for available parameters.

Interaction with Other Config

Extraction profiles are orthogonal to other configuration:

Per-step model overrides (config.overrides) control which LLM model runs each step. Profiles control what the prompt asks for. Both can be used together.
Value function module (config.value_function.module) is unchanged. Profile overrides only affect the parameters passed to the scoring function.
Session config overrides the entire config, including the profile. There is no merge between session and repo profiles.

Next Steps

Core Concepts -- understand the three memory types that profiles steer
Retrieval and Recall -- value function parameters and tuning
Custom Adapters -- per-step model overrides (complementary to profiles)

← Previous Page Sessions and Episodes

Next Page → Retrieval and Recall