Neotomex v0.1.7 Neotomex.Grammar View Source
Neotomex.Grammar
A Neotomex PEG grammar specifies a directed graph of definitons. It consists of:
- A set of definitions, where each definition consists of an
identifierand anexpression. e.g.Definition <- Identifier '<-' Expression - The root definition’s identifier. This root definition will be used as the entry point for matching.
- A transform function can be associated with a Neotomex expression. They are applied after the parse
The definition data types (see the definition/0 typespec below) are verbose and machine readable as opposed to human readable. Typical usage will be to compile a grammar from a more machine readable format, bootstrapping with Neotomex grammars as necessary.
Parsing consists of first using the grammar to match a valid
parse tree, and then applying a transform node by node to
this parse tree.
Match
A grammar consists of definitions which are labeled by an identifier and consist of an expression. An expression can be a
terminal: references a char, string, or regex. It successfully matches when it’s reference appears at the beginning of the input.nonterminal: the identifier of a definition. It matches if the identified definition matches.sequence: an ordered list of expressions. It succesfully matches by folding the expressions against the input.priority: an ordered list of expressions. It matches using the first successful match of the list.zero_or_more: matches when its subexpression greedily matches zero or more times.one_or_more: matches when its subexpression greedily matches one or more times.zero_or_one: matches when its subexpression greedily matches zero or one times.and: matches when its subexpression matches. Consumes no input.not: matches when its subexpression does not match. Consumes no input.prune: matches when it’s subexpression matches, however the match is pruned from the result.
A grammar’s definitions come together to form a conditional directed graph. Matching starts at the root node and performs a depth first search for the first successful route through the tree. This represents the valid parse tree for the grammar and input.
Transform
A transform is a function that modifies the match for an expression. This can be used to evaluate the parse tree. When applying transforms to the matched tree, they are applied depth first.
The default transform is the identity function.
Transforms must return transformed when successful, where
transformed is the transformed match.
Link to this section Summary
Functions
Match the input using the grammar
Returns an empty grammar
Parse the input using the grammar by matching a parse tree and
then applying all transforms
Transform the parse tree returned by match by applying the the expressions’ transform functions via depth first recursion
Validate the grammar. This is especially useful for debugging a grammar since it is exhaustive and provides richer error reporting than a failed match
Link to this section Types
expression :: :empty | {:terminal, terminal} | {:nonterminal, nonterminal} | {:sequence[<a href="#t:expression/0">expression</a>]} | {:priority, [expression]} | {:zero_or_more, expression} | {:one_or_more, expression} | {:zero_or_one, expression} | {:and, expression} | {:not, expression} | {:prune, expression}
grammar() :: %{root: nonterminal | false, definitions: %{optional(nonterminal) => expression}, cache: pid | false}
transform :: {:transform, (term -> {:ok, term})} | {:transform, {atom, atom}} | nil
Link to this section Functions
match(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}, binary) ::
{:ok, {expression, transform}, binary} |
:mismatch |
{:error, term}
Match the input using the grammar.
NB: Not tail call optimized. Possible?
Returns an empty grammar.
parse(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}, binary) ::
{:ok, any, binary} |
:mismatch |
{:error, term}
Parse the input using the grammar by matching a parse tree and
then applying all transforms.
Examples
iex> grammar = new(:root, %{root: {{:terminal, ~r/^[0-9]+/},
...> {:transform, &String.to_integer/1}}})
iex> parse(grammar, "1")
{:ok, 1, ""}
iex> parse(grammar, "100")
{:ok, 100, ""}
Transform the parse tree returned by match by applying the the expressions’ transform functions via depth first recursion.
NB: Not tail call optimized. Possible? Pack rat?
Examples
iex> transform_match({{nil, {:transform, fn x -> String.to_integer(x) end}},
...> {{nil, nil}, "1"}})
1
iex> transform_match({{nil, {:transform, fn [x, y] -> x + y end}},
...> [{{nil, nil}, 1}, {{nil, nil}, 1}]})
2
validate(%Neotomex.Grammar{breadcrumbs: term, definitions: term, root: term}) ::
:ok |
{:error, {:missing, :root | :definitions | :root_definitions | {:definition, atom}}}
Validate the grammar. This is especially useful for debugging a grammar since it is exhaustive and provides richer error reporting than a failed match.
Notes:
- Dialyzer is your friend —
validate/1augments it - Grammar’s are not validated by default due to the performance overhead.
- The validation will return the result of the first failure. There may be more issues with the grammar.
Validation checks that:
- The grammar has a
:rootfields. - The grammar has a
:definitionsfields. - The root references a definition
- All nonterminals reference a definition.
- There are no unreferenced definitions [TODO]
Examples
More complex examples can be found in test/neotomex/grammar_test.exs [todo]
iex> validate(%Neotomex.Grammar{})
{:error, {:missing, :root}}
iex> validate(%Neotomex.Grammar{:root => :root})
{:error, {:missing, :definitions}}
iex> validate(%Neotomex.Grammar{:root => :root, :definitions => %{}})
{:error, {:missing, :root_definition}}
iex> validate(%Neotomex.Grammar{root: :root,
...> definitions: %{root: {:bad_type, :nonsense}}})
{:error, {:bad_definition, {:root, {:bad_expr_type, :bad_type}}}}
iex> validate(%Neotomex.Grammar{:root => :root,
...> :definitions => %{:root => :the_bad_expr}})
{:error, {:bad_definition, {:root, {:bad_expr, :the_bad_expr}}}}
iex> validate(%Neotomex.Grammar{:root => :root,
...> :definitions => %{:root => {:terminal, ?a}}})
:ok