LeXtract.Annotator (lextract v0.1.2)
View SourceAnnotates documents with extractions using LLMs.
The core extraction orchestrator that:
- Chunks documents
- Generates prompts
- Calls LLM via ReqLLM
- Parses and aligns results
- Aggregates into AnnotatedDocument
Extraction Modes
The Annotator supports two modes of operation:
Text Generation Mode (Default)
Uses ReqLLM.generate_text/3 to generate free-form text responses in JSON or YAML
format. The LLM response is parsed and converted to extractions.
template = %{
description: "Extract medication entities",
examples: [
%{
text: "Patient takes aspirin 100mg",
extractions: [
%{extraction_class: "Medication", name: "aspirin", dosage: "100mg"}
]
}
]
}
annotator = LeXtract.Annotator.new(template,
model: "gemini-2.0-flash",
provider: :gemini,
api_key: "your-api-key"
)
doc = LeXtract.Annotator.annotate_text(annotator, "Patient takes aspirin 100mg daily")Structured Output Mode
Uses ReqLLM.generate_object/4 to generate structured output with schema validation.
This mode automatically generates a schema from your examples and ensures the LLM
response conforms to the expected structure.
Enable with :use_structured_output option:
template = %{
description: "Extract medication entities with structured output",
examples: [
%{
text: "Patient takes aspirin 100mg twice daily",
extractions: [
%{
extraction_class: "Medication",
name: "aspirin",
dosage: "100mg",
frequency: "twice daily"
}
]
}
]
}
annotator = LeXtract.Annotator.new(template,
[model: "gemini-2.0-flash", provider: :gemini, api_key: "your-api-key"],
use_structured_output: true
)
doc = LeXtract.Annotator.annotate_text(annotator, "Patient takes aspirin 100mg twice daily")Structured output mode offers several benefits:
- Automatic schema generation from examples
- Built-in validation by the LLM provider
- More reliable parsing (no JSON/YAML parsing errors)
- Better support for complex nested structures
Examples
iex> template = %{
...> description: "Extract medication entities",
...> examples: [
...> %{
...> text: "Patient takes aspirin",
...> extractions: [%{medication: "aspirin"}]
...> }
...> ]
...> }
iex> annotator = LeXtract.Annotator.new(template,
...> model: "gemini-2.0-flash",
...> provider: :gemini,
...> api_key: "test-key"
...> )
iex> is_struct(annotator, LeXtract.Annotator)
true
Summary
Types
@type t() :: %LeXtract.Annotator{ format_handler: LeXtract.FormatHandler.t(), prompt_generator: LeXtract.Prompting.t(), req_llm_config: keyword(), use_structured_output: boolean() }
Functions
@spec annotate_documents(t(), Enumerable.t(LeXtract.Document.t()), keyword()) :: Enumerable.t(LeXtract.AnnotatedDocument.t())
Annotates a stream of documents.
Main API for batch processing. Handles:
- Chunking of long documents
- Batch inference for efficiency
- Alignment of extractions
- Multi-pass extraction (if enabled)
Parameters
annotator- The annotator instancedocuments- Enumerable of%Document{}structsopts- Options (see below)
Options
:max_char_buffer- Max chunk size in chars (default: 1000):batch_size- Number of chunks per LLM batch (default: 5):extraction_passes- Number of passes for multi-pass (default: 1):show_progress- Show progress bar (default: false):chunk_overlap- Chunk overlap in chars (default: 200)
Returns
Stream of %AnnotatedDocument{} with extractions.
@spec annotate_text(t(), String.t(), keyword()) :: LeXtract.AnnotatedDocument.t()
Annotates a single text string.
Convenience wrapper around annotate_documents/3 for single text inputs.
Parameters
annotator- The annotator instancetext- Text to extract fromopts- Options (see annotate_documents/3)
Returns
Single %AnnotatedDocument{} with extractions aligned to text.
Examples
iex> template = %{description: "Extract entities", examples: []}
iex> annotator = LeXtract.Annotator.new(template,
...> model: "gemini-2.0-flash",
...> provider: :gemini,
...> api_key: "test"
...> )
iex> # Note: This example would require mocking ReqLLM in real tests
iex> is_struct(annotator, LeXtract.Annotator)
true
@spec new(LeXtract.Prompting.template(), keyword(), keyword()) :: t()
Creates a new annotator.
Parameters
prompt_template- Template with description and examplesreq_llm_config- ReqLLM configuration (model, provider, API keys, etc.)opts- Options (see below)
Options
:format- Output format (:json or :yaml, default: :yaml):fence_output- Whether to expect fenced output (default: false):attribute_suffix- Suffix for attributes (default: "_attributes"):use_structured_output- Use ReqLLM's generate_object/4 for structured output (default: false)
Examples
iex> template = %{description: "Extract entities", examples: []}
iex> config = [model: "gemini-2.0-flash", provider: :gemini, api_key: "test"]
iex> annotator = LeXtract.Annotator.new(template, config)
iex> annotator.format_handler.format
:yaml