Arcana.Ingest (Arcana v1.3.3)

View Source

Document ingestion for Arcana.

Handles chunking, embedding, and storing documents with optional GraphRAG entity/relationship extraction.

Summary

Functions

Ingests text content, creating a document with embedded chunks.

Ingests a file, parsing its content and creating a document with embedded chunks.

Functions

ingest(text, opts)

Ingests text content, creating a document with embedded chunks.

Options

  • :repo - The Ecto repo to use (required)
  • :source_id - An optional identifier for grouping/filtering
  • :metadata - Optional map of metadata to store with the document
  • :chunk_size - Maximum chunk size in characters (default: 1024)
  • :chunk_overlap - Overlap between chunks (default: 200)
  • :collection - Collection name (string) or map with name and description (default: "default")
  • :graph - Enable GraphRAG extraction (default: from config)

ingest_file(path, opts)

Ingests a file, parsing its content and creating a document with embedded chunks.

Supports multiple file formats including plain text, markdown, and PDF.

Options

  • :repo - The Ecto repo to use (required)
  • :source_id - An optional identifier for grouping/filtering
  • :metadata - Optional map of metadata to store with the document
  • :chunk_size - Maximum chunk size in characters (default: 1024)
  • :chunk_overlap - Overlap between chunks (default: 200)
  • :collection - Collection name to organize the document (default: "default")