View Source Pegasus (pegasus v1.0.0)
A PEG (Parsing Expression Grammar) parser generator for Elixir.
Pegasus compiles PEG grammar definitions into efficient NimbleParsec parsers
at compile time. This gives you the familiar, readable PEG syntax while leveraging
NimbleParsec's optimized parsing engine.
Quick Start
defmodule MyParser do
require Pegasus
Pegasus.parser_from_string("""
numbers <- number (',' number)*
number <- [0-9]+
""",
numbers: [parser: true],
number: [collect: true]
)
end
MyParser.numbers("1,2,3")
# => {:ok, ["1", "2", "3"], "", %{}, {1, 0}, 5}Main API
parser_from_string/2- Define parsers from a PEG grammar stringparser_from_file/2- Load and compile a PEG grammar from a fileparser_from_ast/2- Advanced: compile a pre-parsed AST
PEG Grammar Syntax
Pegasus supports the standard PEG syntax. For the full specification, see the PEG reference.
Rules
Rules are defined with the <- operator:
identifier <- expressionExpressions
| Syntax | Description | Example |
|---|---|---|
'...' or "..." | Literal string | 'hello' |
[...] | Character class | [a-zA-Z] |
[^...] | Negated character class | [^0-9] |
. | Any character | . |
e1 e2 | Sequence | 'a' 'b' |
e1 / e2 | Ordered choice | 'a' / 'b' |
e* | Zero or more | [0-9]* |
e+ | One or more | [0-9]+ |
e? | Optional | '-'? |
&e | Positive lookahead | &'x' |
!e | Negative lookahead | !'x' |
(e) | Grouping | ('a' 'b')* |
<e> | Extracted group | <[a-z]+> |
# ... | Comment | # this is ignored |
Escape Sequences
Pegasus supports ANSI C escape sequences in literals and character classes:
\a- bell\b- backspace\e- escape\f- form feed\n- newline\r- carriage return\t- tab\v- vertical tab\'- single quote\"- double quote\\- backslash\[and\]- brackets (useful in character classes)\-- literal hyphen (in character classes)\377- octal escape (1-3 octal digits)
Parser Options
Options control how each grammar rule is compiled. Pass them as a keyword list where keys are rule names:
Pegasus.parser_from_string(grammar,
rule_name: [option: value, ...]
)Options are applied in the order specified.
:parser
Export the rule as a parser function (an entry point that can be called directly). Without this option, rules become private combinators.
Pegasus.parser_from_string("""
start <- greeting name
greeting <- 'Hello, '
name <- [a-zA-Z]+
""", start: [parser: true])The :parser option also accepts an atom to rename the parser:
Pegasus.parser_from_string("foo <- 'foo'", foo: [parser: :parse])
# Creates `parse/1` instead of `foo/1`
:export
Make a combinator public instead of private. Use this when you need to reference the combinator from other modules or compose it with other NimbleParsec combinators.
Pegasus.parser_from_string("""
foo <- 'foo'
""", foo: [export: true])
:collect
Merge all matched content into a single binary string. Useful for rules that match multiple characters you want combined.
Pegasus.parser_from_string("""
number <- [0-9]+
""", number: [collect: true])
# Without collect: ["1", "2", "3"]
# With collect: "123"Collect requirements
When using
:collect, all nested combinators must leave only iodata (binaries/charlists) in the result. Tags and tokens will cause errors.
:token
Replace the matched content with a token value.
token: true- Use the rule name as the tokentoken: :custom- Use a custom atom as the tokenPegasus.parser_from_string("""
operator <- '+' / '-' / '*' / '/'""", operator: [collect: true, token: :op])
# Matched "+" becomes :op
:tag
Wrap the result in a tagged tuple {tag, content}.
tag: true- Use the rule name as the tagtag: :custom- Use a custom atom as the tagPegasus.parser_from_string("""
number <- [0-9]+""", number: [collect: true, tag: :num])
# Result: {:num, "123"}
:ignore
Discard the matched content. Useful for whitespace and delimiters.
Pegasus.parser_from_string("""
list <- item (',' item)*
item <- [a-z]+
""",
list: [parser: true],
item: [collect: true]
)
:start_position
Inject position information at the start of the match. Adds a map with
:line, :column, and :offset keys.
Pegasus.parser_from_string("""
token <- [a-z]+
""", token: [start_position: true, collect: true])
# Result: [%{line: 1, column: 0, offset: 0}, "hello"]
:post_traverse
Apply a custom transformation function after the rule matches. The function receives the parsing state and can transform the results.
Pegasus.parser_from_string("""
number <- [0-9]+
""", number: [collect: true, post_traverse: {:to_integer, []}])
defp to_integer(rest, [num_string], context, _position, _offset) do
{rest, [String.to_integer(num_string)], context}
endArguments are reversed
The second argument (matched content) is in reversed order from how it was matched. Plan accordingly when pattern matching.
:alias
Substitute a custom combinator in place of the grammar rule. Useful when you need special handling that PEG syntax can't express.
Pegasus.parser_from_string("""
special <- 'x'
""", special: [alias: :my_custom_combinator])You must define my_custom_combinator as a NimbleParsec combinator in
your module.
Capitalized Identifiers
Capitalized PEG identifiers like
StatementorExpressionwork fine. Just remember to put a colon in front of them in the options keyword list, since capitalized names in Elixir are aliases:Pegasus.parser_from_string("Foo <- 'foo'", Foo: [parser: :parse])Capitalized identifiers also require special handling when called directly. You can wrap in a lowercase combinator or use
apply/3:defparsec :parse, parsec(:Foo) # or apply(MyParser, :Foo, ["foo"])
Loading from Files
For larger grammars, store them in .peg files:
# In lib/my_parser.ex
Pegasus.parser_from_file("priv/grammar.peg",
start: [parser: true]
)Output Format
Parsers return the standard NimbleParsec result tuple:
{:ok, results, remaining, context, position, byte_offset}Or on failure:
{:error, message, remaining, context, position, byte_offset}See NimbleParsec documentation for details.
Not Implemented
PEG actions (C code blocks like
{ code }) are not supported, as they are specific to the C implementation. Use:post_traversefor custom transformations.
Summary
Functions
Parses the given binary as parse.
Functions
@spec parse(binary(), keyword()) :: {:ok, [term()], rest, context, line, byte_offset} | {:error, reason, rest, context, line, byte_offset} when line: {pos_integer(), byte_offset}, byte_offset: pos_integer(), rest: binary(), reason: String.t(), context: map()
Parses the given binary as parse.
Returns {:ok, [token], rest, context, position, byte_offset} or
{:error, reason, rest, context, line, byte_offset} where position
describes the location of the parse (start position) as {line, offset_to_start_of_line}.
To column where the error occurred can be inferred from byte_offset - offset_to_start_of_line.
Options
:byte_offset- the byte offset for the whole binary, defaults to 0:line- the line and the byte offset into that line, defaults to{1, byte_offset}:context- the initial context value. It will be converted to a map