View Source PEG Grammar Reference
This guide provides a complete reference for the PEG (Parsing Expression Grammar) syntax supported by Pegasus.
Grammar Structure
A PEG grammar consists of one or more definitions (rules). Each definition has an identifier and an expression:
identifier <- expressionThe first rule is typically the entry point, though in Pegasus you explicitly mark entry points with the :parser option.
Identifiers
Rule names (identifiers) must start with a letter or underscore, followed by letters, digits, or underscores:
my_rule <- ...
Rule2 <- ...
_private <- ...Capitalized Identifiers
Capitalized identifiers like
MyRulework but require special handling when calling from Elixir code. See the Capitalized Identifiers section.
Expressions
Literals
Match exact strings using single or double quotes:
rule <- 'hello'
rule <- "world"Both quote styles are equivalent. Use one when you need to include the other:
quoted <- "it's"
also <- 'say "hello"'Character Classes
Match a single character from a set:
digit <- [0-9]
letter <- [a-zA-Z]
hex <- [0-9a-fA-F]Combine multiple ranges and individual characters:
alphanum <- [a-zA-Z0-9_]
vowel <- [aeiouAEIOU]Negated Classes
Match any character NOT in the set:
not_digit <- [^0-9]
not_quote <- [^"]Special Characters in Classes
Use backslash escapes for special characters:
bracket <- [\[\]] # matches [ or ]
hyphen <- [a\-z] # matches a, -, or z (hyphen as literal)
backslash <- [\\] # matches \Dot (Any Character)
Match any single character:
any <- .This matches any character including newlines.
Sequences
Match expressions in order:
hello_world <- 'hello' ' ' 'world'All parts must match for the sequence to succeed.
Ordered Choice
Try alternatives in order:
bool <- 'true' / 'false'
digit <- '0' / '1' / '2' / '3' / '4' / '5' / '6' / '7' / '8' / '9'The first matching alternative wins. Unlike regular expressions, PEG choices are deterministic and ordered.
Repetition
Zero or More (*)
digits <- [0-9]*
ws <- [ \t\n]*One or More (+)
identifier <- [a-zA-Z] [a-zA-Z0-9]*
number <- [0-9]+Optional (?)
signed_number <- '-'? [0-9]+
optional_semicolon <- ';'?Grouping
Use parentheses to group expressions:
term <- ('+' / '-') number
list <- item (',' item)*Grouping is essential for combining operators:
# Without grouping: matches 'a' or ('b' followed by 'c')
wrong <- 'a' / 'b' 'c'
# With grouping: matches ('a' or 'b') followed by 'c'
right <- ('a' / 'b') 'c'Lookahead
Positive Lookahead (&)
Match only if the expression would match, but don't consume input:
# Match 'a' only if followed by 'b'
a_before_b <- 'a' &'b'Negative Lookahead (!)
Match only if the expression would NOT match:
# Match any character except newline
not_newline <- !'\n' .
# Match identifier that isn't a keyword
identifier <- !keyword [a-zA-Z]+
keyword <- 'if' / 'else' / 'while'
Extracted Groups (<...>)
Mark content for extraction:
quoted <- '"' <[^"]*> '"'Extracted groups filter the result to include only the matched text, excluding surrounding syntax.
Escape Sequences
Pegasus supports ANSI C escape sequences in literals and character classes:
| Escape | Meaning |
|---|---|
\a | Bell (alert) |
\b | Backspace |
\e | Escape |
\f | Form feed |
\n | Newline |
\r | Carriage return |
\t | Horizontal tab |
\v | Vertical tab |
\' | Single quote |
\" | Double quote |
\\ | Backslash |
\[ | Left bracket |
\] | Right bracket |
\- | Hyphen (in character classes) |
Octal Escapes
Specify characters by octal code:
null <- '\0'
bell <- '\7'
tab <- '\11'
max <- '\377'Octal escapes use 1-3 digits, with values from 0-377 (octal).
Comments
Line comments start with #:
# This is a comment
rule <- 'hello' # inline commentOperator Precedence
From highest to lowest precedence:
()- Grouping*,+,?- Repetition&,!- Lookahead- Sequence (implicit)
/- Choice
Example:
# This parses as: (a b*) / c
rule <- a b* / c
# Use grouping to change precedence:
rule <- a (b / c)*Capitalized Identifiers
Capitalized PEG identifiers like
StatementorExpressionwork fine. Just remember to put a colon in front of them in the options keyword list, since capitalized names in Elixir are aliases:Pegasus.parser_from_string("Foo <- 'foo'", Foo: [parser: :parse])Capitalized identifiers also require special handling when called directly. You can wrap in a lowercase combinator or use
apply/3:defparsec :parse, parsec(:Foo) # or apply(MyParser, :Foo, ["foo"])
Common Patterns
Whitespace Handling
ws <- [ \t\n\r]*
token <- ws content wsQuoted Strings
string <- '"' (!'"' .)* '"'With escape sequences:
string <- '"' (escape / !'"' .)* '"'
escape <- '\\' [nrt"\\]Comments (C-style)
line_comment <- '//' (!'\n' .)* '\n'
block_comment <- '/*' (!'*/' .)* '*/'Identifiers
identifier <- [a-zA-Z_] [a-zA-Z0-9_]*Numbers
integer <- '-'? [0-9]+
float <- '-'? [0-9]+ '.' [0-9]+Separated Lists
# Comma-separated with optional trailing comma
list <- item (',' item)* ','?
# At least one item
nonempty_list <- item (',' item)*Debugging Tips
Start simple: Build your grammar incrementally, testing each rule.
Use
:parser: Mark rules you want to test withparser: trueso you can call them directly.Check precedence: When in doubt, add parentheses to make grouping explicit.
Ordered choice matters: Put more specific alternatives first:
# Wrong: 'if' matches before 'ifelse' keyword <- 'if' / 'ifelse' # Right: longer match first keyword <- 'ifelse' / 'if'Avoid left recursion: PEG parsers don't support left recursion.
Left Recursion
Left-recursive rules will cause infinite loops:
# This will infinite loop! expr <- expr '+' term # Use iteration instead expr <- term ('+' term)*