DCATR.Repository (DCAT-R.ex v0.1.0)

Copy Markdown View Source

A distributable catalog combining a dataset with operational infrastructure.

What is a Repository?

A DCAT-R Repository is a managed collection following the pattern of software repositories (npm, Maven, Git, Docker registries): it combines core content with operational mechanisms to support specialized services.

Core characteristics:

  • Managed Collection: Like software repositories that combine content with infrastructure (npm: modules + metadata, Git: files + history, Maven: artifacts + POMs), a DCATR Repository combines a Dataset (user data) with SystemGraphs (operational infrastructure) to support service operations.

  • Distribution Unit: Defines what is replicated together when sharing across DCATR.Service instances - dataset, repository manifest, and distributed system graphs. This boundary enables multi-instance deployments where different services serve the same repository with instance-specific configurations.

  • Single Dataset Focus: Unlike standard DCAT catalogs (which catalog multiple independent datasets), a Repository focuses on one cohesive dataset with rich supporting infrastructure.

  • Extensible Structure: Not a fixed schema—different service types extend by adding specialized distributed SystemGraphs. Versioning services add HistoryGraphs, inference services add InferenceGraphs, API services add shared index graphs, etc.

  • Self-Describing Catalog: A Repository is itself a dcat:Catalog with rich DCAT metadata in its RepositoryManifestGraph, enabling uniform catalog navigation across all hierarchy levels.

Structure

A repository contains:

  • Dataset (DCATR.Dataset) - catalog of user data graphs (multi-graph mode)
  • Data graph (DCATR.DataGraph) - single-graph shortcut via dcatr:repositoryDataGraph
  • Primary graph (DCATR.DataGraph) - semantic designation of the primary graph
  • Repository manifest (DCATR.RepositoryManifestGraph) - DCAT catalog description (required)
  • Distributed system graphs (DCATR.SystemGraph) - operational infrastructure replicated with repository (e.g., history, provenance, shared indexes)

The repository manifest contains the DCAT catalog description of the repository itself, including descriptions of the dataset and system graphs.

Distribution Boundary: Everything in a Repository is distributed/replicated. For local, instance-specific data (service configuration, working graphs, local caches), see DCATR.ServiceData.

Primary Graph

The primary_graph field designates the primary graph. How it gets set depends on the mode:

Single-graph mode (via dcatr:repositoryDataGraph, no dataset wrapper needed):

iex> repo = DCATR.Repository.new!(~I<http://example.org/repo>,
...>   data_graph: DCATR.DataGraph.new!(~I<http://example.org/graph>),
...>   manifest_graph: DCATR.RepositoryManifestGraph.new!(~I<http://example.org/manifest>)
...> )
iex> repo.dataset
nil
iex> repo.data_graph
%DCATR.DataGraph{__id__: ~I<http://example.org/graph>}
iex> DCATR.Repository.graph(repo, :primary)
%DCATR.DataGraph{__id__: ~I<http://example.org/graph>}

data_graph is a sub-property of both repositoryPrimaryGraph (designation) and member (containment). When set, it automatically propagates to primary_graph.

Multi-graph mode (primary designates default among many):

iex> main_graph = DCATR.DataGraph.new!(~I<http://example.org/main>)
iex> repo = DCATR.Repository.new!(~I<http://example.org/repo>,
...>   primary_graph: main_graph,
...>   dataset: DCATR.Dataset.new!(~I<http://example.org/dataset>,
...>     graphs: [
...>       main_graph,
...>       DCATR.DataGraph.new!(~I<http://example.org/aux>)
...>     ]
...>   ),
...>   manifest_graph: DCATR.RepositoryManifestGraph.new!(~I<http://example.org/manifest>)
...> )
iex> DCATR.Repository.graph(repo, :primary)
%DCATR.DataGraph{__id__: ~I<http://example.org/main>}

At least one of dataset or data_graph is required. When both dataset and primary_graph are present, primary_graph must be included in the dataset's graphs.

Extension

Service types requiring distributed operational data define custom Repository types via DCATR.Repository.Type. This allows adding specialized SystemGraph fields (e.g., history_graph for versioning services, inference_graph for reasoning services).

See DCATR.Service.Type for the complete extension pattern.

Schema Mapping

Ontologically, dcatr:Repository is defined as rdfs:subClassOf dcat:Catalog in the DCAT-R vocabulary. However, this Grax schema does not directly inherit from DCAT.Catalog to avoid bloating the Elixir structs with all DCAT properties. Any DCAT metadata on a repository is still preserved in the __additional_statements__ field of the struct.

When needed, Grax schema mapping allows accessing a service as a dcat:Catalog with all DCAT properties mapped to struct fields via the DCAT.Catalog schema from DCAT.ex:

repo = %DCATR.Repository{
  __id__: ~I<http://example.org/repo>,
  dataset: %DCATR.Dataset{__id__: ~I<http://example.org/dataset>},
  __additional_statements__: %{
    ~I<http://purl.org/dc/terms/title> => %{~L"My Repository" => nil}
  }
}

catalog = DCAT.Catalog.from(repo)
catalog.title  # => "My Repository"

Summary

Functions

Returns all graphs through the entire sub-directory tree.

Returns all elements through the entire sub-directory tree.

Returns all direct directories in the repository.

Finds a graph by ID recursively through the sub-directory tree.

Returns a graph by ID or symbolic selector.

Returns all direct graphs in the repository.

Checks if a graph exists in the container.

Returns all direct element members.

Returns the primary graph if one is designated.

Resolves a symbolic selector to a graph.

Returns all system graphs in the repository.

Types

t()

@type t() :: %DCATR.Repository{
  __additional_statements__: term(),
  __id__: term(),
  data_graph: term(),
  dataset: term(),
  manifest_graph: term(),
  primary_graph: term(),
  system_graphs: term()
}

Functions

all_graphs(container)

@spec all_graphs(DCATR.Directory.Type.schema()) :: [DCATR.Graph.t()]

Returns all graphs through the entire sub-directory tree.

all_members(container)

@spec all_members(DCATR.Directory.Type.schema()) :: [DCATR.Element.t()]

Returns all elements through the entire sub-directory tree.

build(id)

build(id, initial)

build!(id)

build!(id, initial)

build_id(attributes)

directories(container)

Returns all direct directories in the repository.

This implementation of DCATR.Directory.Type.directories/1 delegates to DCATR.Repository.Type.directories/1.

find_graph(container, id)

@spec find_graph(DCATR.Directory.Type.schema(), RDF.IRI.coercible()) ::
  DCATR.Graph.t() | nil

Finds a graph by ID recursively through the sub-directory tree.

from(value)

@spec from(Grax.Schema.t()) :: {:ok, t()} | {:error, any()}

from!(value)

@spec from!(Grax.Schema.t()) :: t()

graph(container, id_or_selector)

Returns a graph by ID or symbolic selector.

Tries resolve_graph_selector/2 first. On :undefined, falls back to find_graph/2 (from DCATR.Directory.Type) for ID-based lookup.

graphs(container)

Returns all direct graphs in the repository.

This implementation of DCATR.Directory.Type.graphs/1 delegates to DCATR.Repository.Type.graphs/1.

has_graph?(container, id_or_selector)

Checks if a graph exists in the container.

Convenience function based on graph/2 - returns true if the graph exists, false otherwise.

load(graph, id, opts \\ [])

@spec load(
  RDF.Graph.t() | RDF.Description.t(),
  RDF.IRI.coercible() | RDF.BlankNode.t(),
  opts :: keyword()
) :: {:ok, t()} | {:error, any()}

load!(graph, id, opts \\ [])

@spec load!(
  RDF.Graph.t() | RDF.Description.t(),
  RDF.IRI.coercible() | RDF.BlankNode.t(),
  opts :: keyword()
) :: t()

members(container)

Returns all direct element members.

new(id, attrs)

new!(id, attrs)

primary_graph(repo)

Returns the primary graph if one is designated.

This implementation of DCATR.Repository.Type.primary_graph/1 delegates to DCATR.Repository.Type.primary_graph/1.

resolve_graph_selector(repo, selector)

Resolves a symbolic selector to a graph.

This implementation of DCATR.GraphResolver.resolve_graph_selector/2 delegates to DCATR.Repository.Type.resolve_graph_selector/2.

system_graphs(repo)

Returns all system graphs in the repository.

This implementation of DCATR.Repository.Type.system_graphs/1 delegates to DCATR.Repository.Type.system_graphs/1.