Neotomex v0.1.7 Neotomex.Grammar View Source

Neotomex.Grammar

A Neotomex PEG grammar specifies a directed graph of definitons. It consists of:

A set of definitions, where each definition consists of an identifier and an expression. e.g. Definition <- Identifier '<-' Expression
The root definition’s identifier. This root definition will be used as the entry point for matching.
A transform function can be associated with a Neotomex expression. They are applied after the parse

The definition data types (see the definition/0 typespec below) are verbose and machine readable as opposed to human readable. Typical usage will be to compile a grammar from a more machine readable format, bootstrapping with Neotomex grammars as necessary.

Parsing consists of first using the grammar to match a valid parse tree, and then applying a transform node by node to this parse tree.

Match

A grammar consists of definitions which are labeled by an identifier and consist of an expression. An expression can be a

terminal: references a char, string, or regex. It successfully matches when it’s reference appears at the beginning of the input.
nonterminal: the identifier of a definition. It matches if the identified definition matches.
sequence: an ordered list of expressions. It succesfully matches by folding the expressions against the input.
priority: an ordered list of expressions. It matches using the first successful match of the list.
zero_or_more: matches when its subexpression greedily matches zero or more times.
one_or_more: matches when its subexpression greedily matches one or more times.
zero_or_one: matches when its subexpression greedily matches zero or one times.
and: matches when its subexpression matches. Consumes no input.
not: matches when its subexpression does not match. Consumes no input.
prune: matches when it’s subexpression matches, however the match is pruned from the result.

A grammar’s definitions come together to form a conditional directed graph. Matching starts at the root node and performs a depth first search for the first successful route through the tree. This represents the valid parse tree for the grammar and input.

Transform

A transform is a function that modifies the match for an expression. This can be used to evaluate the parse tree. When applying transforms to the matched tree, they are applied depth first.

The default transform is the identity function.

Transforms must return transformed when successful, where transformed is the transformed match.

Link to this section Summary

Types

definition()

expr_trans()

expression()

grammar()

nonterminal()

terminal()

transform()

Functions

match(grammar, input)

Match the input using the grammar

new(root \\ nil, definitions \\ %{})

Returns an empty grammar

parse(grammar, input)

Parse the input using the grammar by matching a parse tree and then applying all transforms

transform_match(arg1)

Transform the parse tree returned by match by applying the the expressions’ transform functions via depth first recursion

validate(grammar)

Validate the grammar. This is especially useful for debugging a grammar since it is exhaustive and provides richer error reporting than a failed match

Link to this section Types

definition()

definition() :: {nonterminal, expression}

expr_trans()

expr_trans() :: {expression, transform}

expression()

expression ::
  :empty |
  {:terminal, terminal} |
  {:nonterminal, nonterminal} |
  {:sequence[<a href="#t:expression/0">expression</a>]} |
  {:priority, [expression]} |
  {:zero_or_more, expression} |
  {:one_or_more, expression} |
  {:zero_or_one, expression} |
  {:and, expression} |
  {:not, expression} |
  {:prune, expression}

grammar()

grammar() :: %{root: nonterminal | false, definitions: %{optional(nonterminal) => expression}, cache: pid | false}

nonterminal()

nonterminal() :: atom

terminal()

terminal() :: binary | char | Regex.t | nil

transform()

transform ::
  {:transform, (term -> {:ok, term})} |
  {:transform, {atom, atom}} |
  nil

Link to this section Functions

match(grammar, input)

match(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}, binary) ::
  {:ok, {expression, transform}, binary} |
  :mismatch |
  {:error, term}

Match the input using the grammar.

NB: Not tail call optimized. Possible?

new(root \\ nil, definitions \\ %{})

Returns an empty grammar.

parse(grammar, input)

parse(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}, binary) ::
  {:ok, any, binary} |
  :mismatch |
  {:error, term}

Parse the input using the grammar by matching a parse tree and then applying all transforms.

Examples

iex> grammar = new(:root, %{root: {{:terminal, ~r/^[0-9]+/},
...>                               {:transform, &String.to_integer/1}}})
iex> parse(grammar, "1")
{:ok, 1, ""}
iex> parse(grammar, "100")
{:ok, 100, ""}

transform_match(arg1)

transform_match(match) :: any

Transform the parse tree returned by match by applying the the expressions’ transform functions via depth first recursion.

NB: Not tail call optimized. Possible? Pack rat?

Examples

iex> transform_match({{nil, {:transform, fn x -> String.to_integer(x) end}},
...>                  {{nil, nil}, "1"}})
1
iex> transform_match({{nil, {:transform, fn [x, y] -> x + y end}},
...>                  [{{nil, nil}, 1}, {{nil, nil}, 1}]})
2

validate(grammar)

validate(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}) ::
  :ok |
  {:error, {:missing, :root | :definitions | :root_definitions | {:definition, atom}}}

Validate the grammar. This is especially useful for debugging a grammar since it is exhaustive and provides richer error reporting than a failed match.

Notes:

Dialyzer is your friend — validate/1 augments it
Grammar’s are not validated by default due to the performance overhead.
The validation will return the result of the first failure. There may be more issues with the grammar.

Validation checks that:

The grammar has a :root fields.
The grammar has a :definitions fields.
The root references a definition
All nonterminals reference a definition.
There are no unreferenced definitions [TODO]

Examples

More complex examples can be found in test/neotomex/grammar_test.exs [todo]

iex> validate(%Neotomex.Grammar{})
{:error, {:missing, :root}}
iex> validate(%Neotomex.Grammar{:root => :root})
{:error, {:missing, :definitions}}
iex> validate(%Neotomex.Grammar{:root => :root, :definitions => %{}})
{:error, {:missing, :root_definition}}
iex> validate(%Neotomex.Grammar{root: :root,
...>                            definitions: %{root: {:bad_type, :nonsense}}})
{:error, {:bad_definition, {:root, {:bad_expr_type, :bad_type}}}}
iex> validate(%Neotomex.Grammar{:root => :root,
...>                            :definitions => %{:root => :the_bad_expr}})
{:error, {:bad_definition, {:root, {:bad_expr, :the_bad_expr}}}}
iex> validate(%Neotomex.Grammar{:root => :root,
...>                            :definitions => %{:root => {:terminal, ?a}}})
:ok