Arcana.Graph.GraphBuilder (Arcana v1.2.0)

View Source

Builds knowledge graph data from document chunks.

GraphBuilder orchestrates entity extraction, relationship extraction, and mention tracking to create a knowledge graph structure from text.

Usage

GraphBuilder is designed to integrate optionally into the ingest pipeline:

# During ingest (when graph: true option is passed)
chunks = Chunker.chunk(text, opts)
{:ok, graph_data} = GraphBuilder.build(chunks,
  entity_extractor: &Arcana.Graph.EntityExtractor.NER.extract/2,
  relationship_extractor: &RelationshipExtractor.extract/3
)

# Convert to queryable format
graph = GraphBuilder.to_query_graph(graph_data, chunks)

Output Structure

The builder outputs a map with:

%{
  entities: [%{id: "...", name: "...", type: :atom}],
  relationships: [%{source: "...", target: "...", type: "..."}],
  mentions: [%{entity_name: "...", chunk_id: "..."}]
}

This intermediate format can be persisted to a database or converted to the in-memory format used by GraphQuery.

Summary

Functions

Builds graph data from a list of chunks.

Builds graph data from a single text string.

Merges two graph data structures.

Converts builder output to the format used by GraphQuery.

Types

chunk()

@type chunk() :: %{id: String.t(), text: String.t()}

entity()

@type entity() :: %{id: String.t(), name: String.t(), type: atom()}

graph_data()

@type graph_data() :: %{
  entities: [entity()],
  relationships: [relationship()],
  mentions: [mention()]
}

mention()

@type mention() :: %{entity_name: String.t(), chunk_id: String.t()}

relationship()

@type relationship() :: %{source: String.t(), target: String.t(), type: String.t()}

Functions

build(chunks, opts)

@spec build(
  [chunk()],
  keyword()
) :: {:ok, graph_data()} | {:error, term()}

Builds graph data from a list of chunks.

Extracts entities and relationships from each chunk, tracking which entities appear in which chunks (mentions).

Options

  • :extractor - Combined extractor (text, opts) -> {:ok, %{entities: [...], relationships: [...]}}. When provided, this takes priority over separate extractors.
  • :entity_extractor - Function (text, opts) -> {:ok, entities} | {:error, reason}. Used when :extractor is not provided.

  • :relationship_extractor - Function (text, entities, opts) -> {:ok, rels} | {:error, reason}. Used when :extractor is not provided.

Returns

  • {:ok, graph_data} - Successfully built graph data
  • {:error, reason} - If all extractions fail

build_from_text(text, opts)

@spec build_from_text(
  String.t(),
  keyword()
) :: {:ok, graph_data()} | {:error, term()}

Builds graph data from a single text string.

Convenience function for processing a single document without chunks.

merge(graph1, graph2)

@spec merge(graph_data(), graph_data()) :: graph_data()

Merges two graph data structures.

Combines entities (deduplicating by name), relationships, and mentions. Useful for incremental graph building across multiple documents.

to_query_graph(graph_data, chunks)

@spec to_query_graph(graph_data(), [chunk()]) :: Arcana.Graph.GraphQuery.graph()

Converts builder output to the format used by GraphQuery.

Takes the graph data and original chunks to build an indexed graph structure suitable for querying.