TantivyEx.Document (TantivyEx v0.4.1)
View SourceComprehensive document operations for TantivyEx with schema-aware field mapping, validation, and batch processing capabilities.
This module addresses the 70% gap in document operations by providing:
- Proper field-to-value mapping using schema information
- Document validation against schema constraints
- Support for all Tantivy field types in documents
- Batch document operations for performance
- Document updates and deletions (via index rebuilding)
- Enhanced JSON document handling with type conversion
Core Concepts
Schema-Aware Operations
All document operations use the schema to ensure proper field mapping and type validation. Fields are mapped to their correct Tantivy field types based on schema definitions.
Document Validation
Documents are validated against the schema before indexing to catch type mismatches and missing required fields early.
Batch Processing
Batch operations provide significant performance improvements for bulk indexing scenarios.
Field Type Support
Supports all Tantivy field types with proper type conversion:
- Text: String values with optional tokenization
- U64/I64/F64: Numeric values with range validation
- Bool: Boolean true/false values
- Date: DateTime values (Unix timestamps or ISO strings)
- Facet: Hierarchical path strings (e.g., "/category/subcategory")
- Bytes: Base64-encoded binary data
- JSON: Complex JSON objects with schema-aware field extraction
- IpAddr: IPv4 and IPv6 address strings
Usage Examples
# Basic document operations
{:ok, index} = TantivyEx.create_index_in_ram(schema)
{:ok, writer} = TantivyEx.writer(index)
# Single document with validation
doc = %{
"title" => "Getting Started with TantivyEx",
"content" => "This is a comprehensive guide...",
"price" => 29.99,
"published_at" => "2024-01-15T10:30:00Z",
"category" => "/books/programming/elixir"
}
{:ok, validated_doc} = TantivyEx.Document.validate(doc, schema)
:ok = TantivyEx.Document.add(writer, validated_doc, schema)
# Batch operations
documents = [doc1, doc2, doc3]
{:ok, results} = TantivyEx.Document.add_batch(writer, documents, schema)
# Document updates (rebuilds index with new data)
{:ok, new_index} = TantivyEx.Document.update(index, doc_id, updated_fields, schema)
Summary
Functions
Adds a single document to the index with schema validation.
Adds multiple documents to the index in a batch operation.
Deletes a document by term matching.
Prepares a JSON document for indexing by extracting and validating nested fields.
Updates a document by term-based deletion and re-addition.
Validates a document against the provided schema.
Validates a batch of documents against the schema.
Types
Functions
@spec add(TantivyEx.IndexWriter.t(), document(), TantivyEx.Schema.t()) :: :ok | {:error, String.t()}
Adds a single document to the index with schema validation.
Parameters
writer: IndexWriter referencedocument: Document map to addschema: Schema reference for validation and field mapping
Returns
:ok- Document successfully added{:error, reason}- Addition failed with specific error
Examples
iex> doc = %{"title" => "Test Document", "content" => "Sample content"}
iex> :ok = TantivyEx.Document.add(writer, doc, schema)
@spec add_batch( TantivyEx.IndexWriter.t(), [document()], TantivyEx.Schema.t(), keyword() | map() ) :: batch_result()
Adds multiple documents to the index in a batch operation.
Batch operations are significantly more efficient than individual additions for large document sets.
Parameters
writer: IndexWriter referencedocuments: List of document mapsschema: Schema reference for validation and field mappingoptions: Batch processing options
Options
:batch_size- Number of documents to process in each batch (default: 1000):validate- Whether to validate documents (default: true):continue_on_error- Whether to continue processing if a document fails (default: false)
Returns
{:ok, results}- List of results for each document{:error, [{index, error}, ...]}- Errors with document indices
Examples
iex> docs = [%{"title" => "Doc 1"}, %{"title" => "Doc 2"}]
iex> {:ok, results} = TantivyEx.Document.add_batch(writer, docs, schema)
iex> length(results)
2
iex> # With options
iex> {:ok, results} = TantivyEx.Document.add_batch(writer, docs, schema,
...> batch_size: 500, continue_on_error: true)
@spec delete(TantivyEx.IndexWriter.t(), String.t(), String.t(), TantivyEx.Schema.t()) :: {:ok, :deleted} | {:error, String.t()}
Deletes a document by term matching.
Uses Tantivy's term-based deletion to remove documents that match the specified field and value combination.
Parameters
writer: IndexWriter referenceterm_field: Field name to use for identifying the document (e.g., "id")term_value: Value to match for document identificationschema: Schema reference
Returns
{:ok, :deleted}- Document successfully deleted{:error, reason}- Deletion failed
Examples
iex> {:ok, :deleted} = TantivyEx.Document.delete(writer, "id", "doc_123", schema)
@spec prepare_json(map() | String.t(), TantivyEx.Schema.t(), map()) :: {:ok, document()} | {:error, String.t()}
Prepares a JSON document for indexing by extracting and validating nested fields.
Parameters
json_doc: JSON document as a map or JSON stringschema: Schema reference for field extractionfield_mapping: Optional mapping of JSON paths to schema fields
Returns
{:ok, prepared_document}- Document ready for indexing{:error, reason}- JSON processing failed
Examples
iex> json_doc = %{"metadata" => %{"title" => "Test", "tags" => ["elixir", "search"]}}
iex> mapping = %{"metadata.title" => "title", "metadata.tags" => "tags"}
iex> {:ok, doc} = TantivyEx.Document.prepare_json(json_doc, schema, mapping)
@spec update( TantivyEx.IndexWriter.t(), String.t(), any(), map(), TantivyEx.Schema.t() ) :: {:ok, :updated} | {:error, String.t()}
Updates a document by term-based deletion and re-addition.
This implementation uses Tantivy's term-based document deletion followed by adding the updated document. This is more efficient than full index rebuilding for sparse updates.
Parameters
writer: IndexWriter referenceterm_field: Field name to use for identifying the document (e.g., "id")term_value: Value to match for document identificationupdated_document: Complete updated document mapschema: Schema reference
Returns
{:ok, :updated}- Document successfully updated{:error, reason}- Update failed
Examples
iex> updated_doc = %{"id" => "doc_123", "title" => "Updated Title", "price" => 39.99}
iex> {:ok, :updated} = TantivyEx.Document.update(writer, "id", "doc_123", updated_doc, schema)
@spec validate(document(), TantivyEx.Schema.t()) :: {:ok, document()} | validation_error()
Validates a document against the provided schema.
Ensures all field types match schema expectations and converts values to appropriate types where possible.
Parameters
document: Map containing field names and valuesschema: Schema reference to validate against
Returns
{:ok, validated_document}- Document with type-converted values{:error, reason}- Validation error with specific details
Examples
iex> doc = %{"title" => "Test", "price" => "29.99", "published_at" => "2024-01-15T10:30:00Z"}
iex> {:ok, validated} = TantivyEx.Document.validate(doc, schema)
iex> validated["price"]
29.99
iex> is_integer(validated["published_at"])
true
@spec validate_batch([document()], TantivyEx.Schema.t()) :: {:ok, [document()]} | {:error, [{integer(), String.t()}]}
Validates a batch of documents against the schema.
Parameters
documents: List of document mapsschema: Schema reference to validate against
Returns
{:ok, validated_documents}- All documents successfully validated{:error, [{index, error}, ...]}- List of validation errors with document indices
Examples
iex> docs = [%{"title" => "Doc 1"}, %{"title" => "Doc 2"}]
iex> {:ok, validated} = TantivyEx.Document.validate_batch(docs, schema)
iex> length(validated)
2