Neotomex v0.1.7 Neotomex.Grammar View Source

Neotomex.Grammar

A Neotomex PEG grammar specifies a directed graph of definitons. It consists of:

  • A set of definitions, where each definition consists of an identifier and an expression. e.g. Definition <- Identifier '<-' Expression
  • The root definition’s identifier. This root definition will be used as the entry point for matching.
  • A transform function can be associated with a Neotomex expression. They are applied after the parse

The definition data types (see the definition/0 typespec below) are verbose and machine readable as opposed to human readable. Typical usage will be to compile a grammar from a more machine readable format, bootstrapping with Neotomex grammars as necessary.

Parsing consists of first using the grammar to match a valid parse tree, and then applying a transform node by node to this parse tree.

Match

A grammar consists of definitions which are labeled by an identifier and consist of an expression. An expression can be a

  • terminal: references a char, string, or regex. It successfully matches when it’s reference appears at the beginning of the input.
  • nonterminal: the identifier of a definition. It matches if the identified definition matches.
  • sequence: an ordered list of expressions. It succesfully matches by folding the expressions against the input.
  • priority: an ordered list of expressions. It matches using the first successful match of the list.
  • zero_or_more: matches when its subexpression greedily matches zero or more times.
  • one_or_more: matches when its subexpression greedily matches one or more times.
  • zero_or_one: matches when its subexpression greedily matches zero or one times.
  • and: matches when its subexpression matches. Consumes no input.
  • not: matches when its subexpression does not match. Consumes no input.
  • prune: matches when it’s subexpression matches, however the match is pruned from the result.

A grammar’s definitions come together to form a conditional directed graph. Matching starts at the root node and performs a depth first search for the first successful route through the tree. This represents the valid parse tree for the grammar and input.

Transform

A transform is a function that modifies the match for an expression. This can be used to evaluate the parse tree. When applying transforms to the matched tree, they are applied depth first.

The default transform is the identity function.

Transforms must return transformed when successful, where transformed is the transformed match.

Link to this section Summary

Functions

Match the input using the grammar

Returns an empty grammar

Parse the input using the grammar by matching a parse tree and then applying all transforms

Transform the parse tree returned by match by applying the the expressions’ transform functions via depth first recursion

Validate the grammar. This is especially useful for debugging a grammar since it is exhaustive and provides richer error reporting than a failed match

Link to this section Types

Link to this type expression() View Source
expression ::
  :empty |
  {:terminal, terminal} |
  {:nonterminal, nonterminal} |
  {:sequence[<a href="#t:expression/0">expression</a>]} |
  {:priority, [expression]} |
  {:zero_or_more, expression} |
  {:one_or_more, expression} |
  {:zero_or_one, expression} |
  {:and, expression} |
  {:not, expression} |
  {:prune, expression}
Link to this type grammar() View Source
grammar() :: %{root: nonterminal | false, definitions: %{optional(nonterminal) => expression}, cache: pid | false}
Link to this type nonterminal() View Source
nonterminal() :: atom
Link to this type terminal() View Source
terminal() :: binary | char | Regex.t | nil
Link to this type transform() View Source
transform ::
  {:transform, (term -> {:ok, term})} |
  {:transform, {atom, atom}} |
  nil

Link to this section Functions

Link to this function match(grammar, input) View Source
match(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}, binary) ::
  {:ok, {expression, transform}, binary} |
  :mismatch |
  {:error, term}

Match the input using the grammar.

NB: Not tail call optimized. Possible?

Link to this function new(root \\ nil, definitions \\ %{}) View Source

Returns an empty grammar.

Link to this function parse(grammar, input) View Source
parse(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}, binary) ::
  {:ok, any, binary} |
  :mismatch |
  {:error, term}

Parse the input using the grammar by matching a parse tree and then applying all transforms.

Examples

iex> grammar = new(:root, %{root: {{:terminal, ~r/^[0-9]+/},
...>                               {:transform, &String.to_integer/1}}})
iex> parse(grammar, "1")
{:ok, 1, ""}
iex> parse(grammar, "100")
{:ok, 100, ""}
Link to this function transform_match(arg1) View Source
transform_match(match) :: any

Transform the parse tree returned by match by applying the the expressions’ transform functions via depth first recursion.

NB: Not tail call optimized. Possible? Pack rat?

Examples

iex> transform_match({{nil, {:transform, fn x -> String.to_integer(x) end}},
...>                  {{nil, nil}, "1"}})
1
iex> transform_match({{nil, {:transform, fn [x, y] -> x + y end}},
...>                  [{{nil, nil}, 1}, {{nil, nil}, 1}]})
2
Link to this function validate(grammar) View Source
validate(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}) ::
  :ok |
  {:error, {:missing, :root | :definitions | :root_definitions | {:definition, atom}}}

Validate the grammar. This is especially useful for debugging a grammar since it is exhaustive and provides richer error reporting than a failed match.

Notes:

  • Dialyzer is your friend — validate/1 augments it
  • Grammar’s are not validated by default due to the performance overhead.
  • The validation will return the result of the first failure. There may be more issues with the grammar.

Validation checks that:

  • The grammar has a :root fields.
  • The grammar has a :definitions fields.
  • The root references a definition
  • All nonterminals reference a definition.
  • There are no unreferenced definitions [TODO]

Examples

More complex examples can be found in test/neotomex/grammar_test.exs [todo]

iex> validate(%Neotomex.Grammar{})
{:error, {:missing, :root}}
iex> validate(%Neotomex.Grammar{:root => :root})
{:error, {:missing, :definitions}}
iex> validate(%Neotomex.Grammar{:root => :root, :definitions => %{}})
{:error, {:missing, :root_definition}}
iex> validate(%Neotomex.Grammar{root: :root,
...>                            definitions: %{root: {:bad_type, :nonsense}}})
{:error, {:bad_definition, {:root, {:bad_expr_type, :bad_type}}}}
iex> validate(%Neotomex.Grammar{:root => :root,
...>                            :definitions => %{:root => :the_bad_expr}})
{:error, {:bad_definition, {:root, {:bad_expr, :the_bad_expr}}}}
iex> validate(%Neotomex.Grammar{:root => :root,
...>                            :definitions => %{:root => {:terminal, ?a}}})
:ok