View Source Pegasus (pegasus v1.0.0)

A PEG (Parsing Expression Grammar) parser generator for Elixir.

Pegasus compiles PEG grammar definitions into efficient NimbleParsec parsers at compile time. This gives you the familiar, readable PEG syntax while leveraging NimbleParsec's optimized parsing engine.

Quick Start

defmodule MyParser do
  require Pegasus

  Pegasus.parser_from_string("""
    numbers <- number (',' number)*
    number  <- [0-9]+
  """,
    numbers: [parser: true],
    number: [collect: true]
  )
end

MyParser.numbers("1,2,3")
# => {:ok, ["1", "2", "3"], "", %{}, {1, 0}, 5}

Main API

PEG Grammar Syntax

Pegasus supports the standard PEG syntax. For the full specification, see the PEG reference.

Rules

Rules are defined with the <- operator:

identifier <- expression

Expressions

SyntaxDescriptionExample
'...' or "..."Literal string'hello'
[...]Character class[a-zA-Z]
[^...]Negated character class[^0-9]
.Any character.
e1 e2Sequence'a' 'b'
e1 / e2Ordered choice'a' / 'b'
e*Zero or more[0-9]*
e+One or more[0-9]+
e?Optional'-'?
&ePositive lookahead&'x'
!eNegative lookahead!'x'
(e)Grouping('a' 'b')*
<e>Extracted group<[a-z]+>
# ...Comment# this is ignored

Escape Sequences

Pegasus supports ANSI C escape sequences in literals and character classes:

  • \a - bell
  • \b - backspace
  • \e - escape
  • \f - form feed
  • \n - newline
  • \r - carriage return
  • \t - tab
  • \v - vertical tab
  • \' - single quote
  • \" - double quote
  • \\ - backslash
  • \[ and \] - brackets (useful in character classes)
  • \- - literal hyphen (in character classes)
  • \377 - octal escape (1-3 octal digits)

Parser Options

Options control how each grammar rule is compiled. Pass them as a keyword list where keys are rule names:

Pegasus.parser_from_string(grammar,
  rule_name: [option: value, ...]
)

Options are applied in the order specified.

:parser

Export the rule as a parser function (an entry point that can be called directly). Without this option, rules become private combinators.

Pegasus.parser_from_string("""
  start <- greeting name
  greeting <- 'Hello, '
  name <- [a-zA-Z]+
""", start: [parser: true])

The :parser option also accepts an atom to rename the parser:

Pegasus.parser_from_string("foo <- 'foo'", foo: [parser: :parse])
# Creates `parse/1` instead of `foo/1`

:export

Make a combinator public instead of private. Use this when you need to reference the combinator from other modules or compose it with other NimbleParsec combinators.

Pegasus.parser_from_string("""
  foo <- 'foo'
""", foo: [export: true])

:collect

Merge all matched content into a single binary string. Useful for rules that match multiple characters you want combined.

Pegasus.parser_from_string("""
  number <- [0-9]+
""", number: [collect: true])

# Without collect: ["1", "2", "3"]
# With collect: "123"

Collect requirements

When using :collect, all nested combinators must leave only iodata (binaries/charlists) in the result. Tags and tokens will cause errors.

:token

Replace the matched content with a token value.

  • token: true - Use the rule name as the token

  • token: :custom - Use a custom atom as the token

    Pegasus.parser_from_string("""

    operator <- '+' / '-' / '*' / '/'

    """, operator: [collect: true, token: :op])

    # Matched "+" becomes :op

:tag

Wrap the result in a tagged tuple {tag, content}.

  • tag: true - Use the rule name as the tag

  • tag: :custom - Use a custom atom as the tag

    Pegasus.parser_from_string("""

    number <- [0-9]+

    """, number: [collect: true, tag: :num])

    # Result: {:num, "123"}

:ignore

Discard the matched content. Useful for whitespace and delimiters.

Pegasus.parser_from_string("""
  list <- item (',' item)*
  item <- [a-z]+
""",
  list: [parser: true],
  item: [collect: true]
)

:start_position

Inject position information at the start of the match. Adds a map with :line, :column, and :offset keys.

Pegasus.parser_from_string("""
  token <- [a-z]+
""", token: [start_position: true, collect: true])

# Result: [%{line: 1, column: 0, offset: 0}, "hello"]

:post_traverse

Apply a custom transformation function after the rule matches. The function receives the parsing state and can transform the results.

Pegasus.parser_from_string("""
  number <- [0-9]+
""", number: [collect: true, post_traverse: {:to_integer, []}])

defp to_integer(rest, [num_string], context, _position, _offset) do
  {rest, [String.to_integer(num_string)], context}
end

Arguments are reversed

The second argument (matched content) is in reversed order from how it was matched. Plan accordingly when pattern matching.

:alias

Substitute a custom combinator in place of the grammar rule. Useful when you need special handling that PEG syntax can't express.

Pegasus.parser_from_string("""
  special <- 'x'
""", special: [alias: :my_custom_combinator])

You must define my_custom_combinator as a NimbleParsec combinator in your module.

Capitalized Identifiers

Capitalized PEG identifiers like Statement or Expression work fine. Just remember to put a colon in front of them in the options keyword list, since capitalized names in Elixir are aliases:

Pegasus.parser_from_string("Foo <- 'foo'", Foo: [parser: :parse])

Capitalized identifiers also require special handling when called directly. You can wrap in a lowercase combinator or use apply/3:

defparsec :parse, parsec(:Foo)
# or
apply(MyParser, :Foo, ["foo"])

Loading from Files

For larger grammars, store them in .peg files:

# In lib/my_parser.ex
Pegasus.parser_from_file("priv/grammar.peg",
  start: [parser: true]
)

Output Format

Parsers return the standard NimbleParsec result tuple:

{:ok, results, remaining, context, position, byte_offset}

Or on failure:

{:error, message, remaining, context, position, byte_offset}

See NimbleParsec documentation for details.

Not Implemented

PEG actions (C code blocks like { code }) are not supported, as they are specific to the C implementation. Use :post_traverse for custom transformations.

Summary

Functions

Link to this function

parse(binary, opts \\ [])

View Source
@spec parse(binary(), keyword()) ::
  {:ok, [term()], rest, context, line, byte_offset}
  | {:error, reason, rest, context, line, byte_offset}
when line: {pos_integer(), byte_offset},
     byte_offset: pos_integer(),
     rest: binary(),
     reason: String.t(),
     context: map()

Parses the given binary as parse.

Returns {:ok, [token], rest, context, position, byte_offset} or {:error, reason, rest, context, line, byte_offset} where position describes the location of the parse (start position) as {line, offset_to_start_of_line}.

To column where the error occurred can be inferred from byte_offset - offset_to_start_of_line.

Options

  • :byte_offset - the byte offset for the whole binary, defaults to 0
  • :line - the line and the byte offset into that line, defaults to {1, byte_offset}
  • :context - the initial context value. It will be converted to a map
Link to this macro

parser_from_ast(ast, opts)

View Source (macro)
Link to this macro

parser_from_file(file, opts \\ [])

View Source (macro)
Link to this macro

parser_from_string(string, opts \\ [])

View Source (macro)