View Source Smee.Metadata (Smee v0.4.1)

The Metadata module wraps up Metadata XML into a struct and contains functions that may be helpful when working with them. The metadata is either an aggregate (as used by federations to contain many entity records) or a single entity.

Many of the functions mirror those in the Smee.Entity module - the same actions but on larger source XML rather than on fragments.

The XML in metadata structs can be compressed or decompressed, but unlike Entities there is no cached, parsed xmlerl record by default - this is to save time and memory.

Wherever possible use Metadata.update/2 to make changes, do not write to the Entity struct directly. If you must write directly you can use Metadata.update/1 to resync the state of the record.

Methods in Smee.Metadata can be used to extract individual entity records each containing a fragment of XML. It's strongly recommended to stream these using stream_entities2 to save on memory, or select a particular entity using entity2.

Summary

Functions

Raises an exception if the metadata has expired (based on valid_until datetime), otherwise returns the metadata.

Returns compressed metadata, containing gzipped XML. This greatly reduces the size of the metadata record.

Returns true if the XML data in an metadata struct has been compressed

Returns the number of entities in the metadata file

Returns decompressed metadata, with plain-text XML data. This makes the struct much larger.

Returns a new metadata struct based on the streamed entities passed as the first parameter.

Returns all entities in the metadata as a list of entity structs.

Returns the specified entity from the metadata in an :ok/:error struct

Returns the specified entity from the metadata or raises an exception if not found

Returns a list of all entity IDs in the metadata

Returns true if the metadata has expired (based on valid_until datetime)

Returns a suggested filename for the metadata.

Returns a suggested filename for the metadata in the specified format.

Returns a new metadata struct based on the XML data passed as the first parameter.

Returns one randomly selected entity from the metadata

Returns a stream of all entities in the metadata.

Tags a metadata record with one or more tags, replacing existing tags.

Returns the tags of the metadata struct, a list of binary strings

Resyncs the internal state of a %Metadata{} struct

Returns an updates %Metadata{} struct with new XML, refreshing various parts of the struct correctly.

Raises an exception if the metadata has invalid XML, otherwise returns the metadata.

Returns a parsed Erlang xmerl structure representing the metadata XML, for use with xmerl, SweetXML and other tools.

Returns the XML for the metadata, unchanged, and decompressed.

Returns the XML for the metadata, decompressed, after a processing stage.

Types

@type t() :: %Smee.Metadata{
  cache_duration: nil | binary(),
  cert_fingerprint: nil | binary(),
  cert_url: nil | binary(),
  changes: integer(),
  compressed: boolean(),
  data: nil | binary(),
  data_hash: nil | binary(),
  downloaded_at: nil | DateTime.t(),
  entity_count: integer(),
  etag: nil | binary(),
  file_uid: nil | binary(),
  id: nil | binary(),
  label: nil | binary(),
  modified_at: nil | DateTime.t(),
  priority: integer(),
  size: integer(),
  tags: [binary()],
  trustiness: float(),
  type: atom(),
  uri: nil | binary(),
  uri_hash: nil | binary(),
  url: nil | binary(),
  url_hash: nil | binary(),
  valid_until: nil | DateTime.t(),
  verified: boolean()
}

Functions

@spec check_date!(metadata :: t()) :: t()

Raises an exception if the metadata has expired (based on valid_until datetime), otherwise returns the metadata.

If no valid_until has been set (if it's nil) then the metadata will always be returned.

@spec compress(metadata :: t()) :: t()

Returns compressed metadata, containing gzipped XML. This greatly reduces the size of the metadata record.

@spec compressed?(metadata :: t()) :: boolean()

Returns true if the XML data in an metadata struct has been compressed

@spec count(metadata :: t()) :: integer()

Returns the number of entities in the metadata file

@spec decompress(metadata :: t()) :: t()

Returns decompressed metadata, with plain-text XML data. This makes the struct much larger.

Link to this function

derive(enum, options \\ [])

View Source
@spec derive(data :: Enumerable.t() | Smee.Entity.t(), options :: keyword()) :: t()

Returns a new metadata struct based on the streamed entities passed as the first parameter.

You can set or override various parts of the struct by passing options:

  • url - the original location of the metadata
  • uri - a URI that identifies the metadata (Name)
  • downloaded_at - A DateTime to record when the record was downloaded
  • modified_at - A DateTime to record when the record was updated upstream
  • valid_until - A DateTime to indicate when an entity expires
  • priority - An integer between 0 and 9 to show priority
  • trustiness - a Float between 0.0 and 0.9 to indicate, well, trustiness.
  • etag - a string to use as an etag (unique content identifier).
  • cert_url - location of a certificate to use for signature verification
  • cert_fingerprint - fingerprint of the certificate to use for certificate verification
  • label - a description for the metadata
@spec entities(metadata :: t()) :: [Smee.Entity.t()]

Returns all entities in the metadata as a list of entity structs.

This can produce very large lists very slowly. The stream_entities2 function is much better.

@spec entity(metadata :: t(), uri :: binary()) :: Smee.Entity.t() | nil

Returns the specified entity from the metadata in an :ok/:error struct

@spec entity!(metadata :: t(), uri :: binary()) :: Smee.Entity.t()

Returns the specified entity from the metadata or raises an exception if not found

@spec entity_ids(metadata :: t()) :: [binary()]

Returns a list of all entity IDs in the metadata

@spec expired?(metadata :: t()) :: boolean()

Returns true if the metadata has expired (based on valid_until datetime)

If no valid_until has been set (if it's nil) then false will be returned

@spec filename(metadata :: t()) :: binary()

Returns a suggested filename for the metadata.

Link to this function

filename(metadata, atom)

View Source
@spec filename(metadata :: t(), format :: atom()) :: binary()

Returns a suggested filename for the metadata in the specified format.

Two formats can be specified: :sha1 and :uri

Link to this function

new(data, options \\ [])

View Source
@spec new(data :: binary(), options :: keyword()) :: t()

Returns a new metadata struct based on the XML data passed as the first parameter.

You can set or override various parts of the struct by passing options:

  • url - the original location of the metadata
  • uri - a URI that identifies the metadata (Name)
  • downloaded_at - A DateTime to record when the record was downloaded
  • modified_at - A DateTime to record when the record was updated upstream
  • valid_until - A DateTime to indicate when an entity expires
  • priority - An integer between 0 and 9 to show priority
  • trustiness - a Float between 0.0 and 0.9 to indicate, well, trustiness.
  • etag - a string to use as an etag (unique content identifier).
  • cert_url - location of a certificate to use for signature verification
  • cert_fingerprint - fingerprint of the certificate to use for certificate verification
  • label - a description for the metadata

In most cases it is better to use Smee.Source and then Smee.Fetch to generate a metadata struct.

@spec random_entity(metadata :: t()) :: Smee.Entity.t()

Returns one randomly selected entity from the metadata

Link to this function

stream_entities(metadata, options \\ [])

View Source
@spec stream_entities(metadata :: t(), options :: keyword()) :: Enumerable.t()

Returns a stream of all entities in the metadata.

@spec tag(metadata :: t(), tags :: list() | nil | binary()) :: t()

Tags a metadata record with one or more tags, replacing existing tags.

Tags are arbitrary classifiers, initially inherited from sources

@spec tags(metadata :: t()) :: [binary()]

Returns the tags of the metadata struct, a list of binary strings

Tags are arbitrary strings, which may be initially inherited from source records, and will be passed on to entities.

@spec update(metadata :: t()) :: t()

Resyncs the internal state of a %Metadata{} struct

If changes have been made using Metadata.update/2 then this will not be needed - it's there for when the struct has been changed directly

@spec update(metadata :: t(), xml :: binary()) :: t()

Returns an updates %Metadata{} struct with new XML, refreshing various parts of the struct correctly.

This should be the only way updated Metadata structs are produced - the raw struct should not be changed directly.

@spec validate!(metadata :: t()) :: t()

Raises an exception if the metadata has invalid XML, otherwise returns the metadata.

@spec xdoc(entity :: t()) :: tuple()

Returns a parsed Erlang xmerl structure representing the metadata XML, for use with xmerl, SweetXML and other tools.

Using this is not recommended as it will create a very, very large xmerl structure. The Smee.Transform and Smee.Extract modules may be more efficient for working with large metadata files, and the best approach is to stream and work with Smee.Entity records using Smee.Metadata.stream_entities/2

Unlike the similar function for Entity it is not possible to cache this in the struct, so it will be regenerated every time.

@spec xml(metadata :: t()) :: binary()

Returns the XML for the metadata, unchanged, and decompressed.

The XML is returned as a binary string - it may be very large, and larger than the struct it comes from.

Link to this function

xml_processed(metadata, process_type \\ :default)

View Source
@spec xml_processed(metadata :: t(), process_type :: atom()) :: binary()

Returns the XML for the metadata, decompressed, after a processing stage.

Available processing options:

  • :default and :none - Nothing is changed, so it will be the same output as Smee.Metadata.xml/1
  • :strip - XML has comments removed, signature removed, and XML declaration removed.

The XML is returned as a binary string - it may be very large.