LeXtract (lextract v0.1.2)
View SourceLeXtract
LLM-powered text extraction library for Elixir. Based on Google's LangExtract
LeXtract enables you to extract structured information from unstructured text using Large Language Models (LLMs). It provides a simple, streaming API with support for multiple LLM providers.
Features
- Multi-Provider LLM Support - Works with OpenAI, Gemini, Anthropic, and other providers through ReqLLM
- Streaming API - Memory-efficient batch processing with lazy streams
- Automatic Text Chunking - Handles long documents with configurable chunk sizes and overlap
- Character-Level Alignment - Precise alignment of extractions to source text positions
- Schema Generation - Automatic schema inference from examples
- Template-Based Configuration - Reusable extraction templates in JSON or YAML
- Structured Output Mode - Enhanced reliability with schema validation
- Multi-Pass Extraction - Improved recall through multiple extraction passes
- Flexible Output Formats - Support for JSON and YAML output formats
Installation
Add lextract to your list of dependencies in mix.exs:
def deps do
[
{:lextract, "~> 0.1.0"}
]
endQuick Start
Basic Entity Extraction
Extract named entities from text with inline template options:
{:ok, stream} = LeXtract.extract(
"Dr. Smith prescribed aspirin 100mg to the patient.",
prompt: "Extract medical entities from the text",
examples: [
%{
text: "Patient takes ibuprofen 200mg",
extractions: [
%{extraction_class: "Medication", name: "ibuprofen", dosage: "200mg"}
]
}
],
model: "gpt-4o-mini",
provider: :openai
)
annotated_docs = Enum.to_list(stream)Using Template Files
Create a template file (JSON or YAML) for reusable extraction configurations:
# medication_template.yaml
description: Extract medication entities with dosage and frequency
examples:
- text: "Patient takes aspirin 100mg twice daily"
extractions:
- extraction_class: Medication
name: aspirin
dosage: 100mg
frequency: twice dailyThen extract using the template:
{:ok, stream} = LeXtract.extract(
"Dr. Jones prescribed metformin 500mg once daily.",
template_file: "medication_template.yaml",
model: "gpt-4o-mini",
provider: :openai
)Batch Processing with Streams
Process multiple documents efficiently with streaming:
documents = [
"First patient document...",
"Second patient document...",
"Third patient document..."
]
{:ok, stream} = LeXtract.extract(
documents,
prompt: "Extract medical conditions",
examples: [...],
model: "gpt-4o-mini",
provider: :openai,
batch_size: 5
)
stream
|> Stream.each(fn annotated_doc ->
IO.puts("Document: #{annotated_doc.document_id}")
IO.puts("Extractions: #{length(annotated_doc.extractions)}")
end)
|> Stream.run()Structured Output Mode
For better reliability and schema validation, use structured output mode:
{:ok, stream} = LeXtract.extract(
"Patient has hypertension and diabetes.",
prompt: "Extract medical conditions",
examples: [
%{
text: "Patient diagnosed with asthma",
extractions: [
%{extraction_class: "Condition", name: "asthma", severity: "mild"}
]
}
],
model: "gpt-4o-mini",
provider: :openai,
use_structured_output: true
)Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Summary
Functions
Extracts structured information from text using LLMs.
Extracts structured information from text, raising on error.
Extracts structured information from a text file.
Validates extraction options against the schema.
Functions
@spec extract( source_document :: String.t() | [String.t()] | [LeXtract.Document.t()], options :: LeXtract.Config.options() ) :: {:ok, Enumerable.t(LeXtract.AnnotatedDocument.t())} | {:error, Exception.t()}
Extracts structured information from text using LLMs.
This is the main entry point for the library. It accepts text (string, list of strings, or list of Document structs) and returns a lazy Stream of AnnotatedDocument results.
Parameters
input- Text to extract from (String.t(), [String.t()], or [Document.t()])opts- Extraction options (see module documentation for full list)
Returns
{:ok, Stream.t(AnnotatedDocument.t())} or {:error, reason}
Examples
iex> {:ok, _stream} = LeXtract.extract(
...> "Sample text",
...> prompt: "Extract entities",
...> examples: [],
...> model: "gpt-4o-mini",
...> provider: :openai,
...> api_key: "test-key"
...> )
@spec extract!( source_document :: String.t() | [String.t()] | [LeXtract.Document.t()], options :: LeXtract.Config.options() ) :: Enumerable.t(LeXtract.AnnotatedDocument.t())
Extracts structured information from text, raising on error.
Same as extract/2 but returns the stream directly or raises an exception on error.
Examples
iex> stream = LeXtract.extract!(
...> "Sample text",
...> prompt: "Extract entities",
...> examples: [],
...> model: "gpt-4o-mini",
...> provider: :openai,
...> api_key: "test-key"
...> )
iex> is_struct(stream, Stream)
true
@spec extract_from_file(file_path :: Path.t(), options :: LeXtract.Config.options()) :: {:ok, Enumerable.t(LeXtract.AnnotatedDocument.t())} | {:error, Exception.t()}
Extracts structured information from a text file.
Reads the file content and then calls extract/2. Useful for processing
single documents stored on disk.
Parameters
file_path- Path to text fileopts- Extraction options (seeextract/2)
Returns
{:ok, Stream.t(AnnotatedDocument.t())} or {:error, reason}
Examples
iex> File.write!("/tmp/test_doc.txt", "Sample text")
iex> {:ok, stream} = LeXtract.extract_from_file(
...> "/tmp/test_doc.txt",
...> prompt: "Extract entities",
...> examples: [],
...> model: "gpt-4o-mini",
...> provider: :openai,
...> api_key: "test-key"
...> )
iex> is_struct(stream, Stream)
true
iex> File.rm("/tmp/test_doc.txt")
:ok
@spec validate_options(LeXtract.Config.options()) :: {:ok, LeXtract.Config.options()} | {:error, Exception.t()}
Validates extraction options against the schema.
Useful for validating options before processing or for debugging configuration issues.
Parameters
opts- Keyword list of options
Returns
{:ok, validated_opts} or {:error, validation_error}
Examples
iex> {:ok, opts} = LeXtract.validate_options(
...> prompt: "Extract",
...> model: "gpt-4o-mini",
...> provider: :openai,
...> api_key: "key"
...> )
iex> Keyword.get(opts, :prompt)
"Extract"
iex> Keyword.get(opts, :format)
:yaml