chomp
Types
The Error
type represents all of the ways that a parser can fail. It has
two type parameters, e
and tok
. See the Parser
type for more
information about them.
pub type Error(e, tok) {
Custom(e)
EndOfInput
Expected(tok, got: tok)
Unexpected(tok)
BadParser(String)
}
Constructors
-
Custom(e)
A custom error.
-
EndOfInput
There are no more tokens to consume, but the parser required some.
-
Expected(tok, got: tok)
The parser expected a certain token but got a different one instead.
-
Unexpected(tok)
The parser encountered an unexpected token. This error is not very specific, so it’s often best to replace it using the
or_error
function when possible. -
BadParser(String)
A parser was called with incorrect input.
This type is very similar to
list.ContinueOrStop
.
pub type Loop(a, state) {
Continue(state)
Break(a)
}
Constructors
-
Continue(state)
Continue parsing with the new state.
-
Break(a)
Stop parsing and return a result.
The Parser
type has four parameters; let’s take a look at each of them:
Parser(a, e, tok, ctx)
-
a
is the type of value that the parser knows how to produce. If you were writing a parser for a programming language, this might be your expression type. -
e
is the type of custom error that you can choose to throw. This could be aString
for simple errors such as “I expected an expression” or something more complex. -
tok
is the type of tokens that the parser knows how to consume. You can take a look at theToken
type for a bit more info, but note that it’s not necessary for the token stream to come from chomp’s lexer. -
ctx
is used to make error reporting nicer. You can place a parser into a custom context. When the parser runs the context gets pushed into a stack. If the parser fails you can see the context stack in the error message, which can make error reporting and debugging much easier! See thein
function for more details.
It can get a bit repetitive to write out the Parser
type with all of the
parameters in type annotations, so it’s common to make a type alias for it in
your parser.
import chomp
type Parser(a) = chomp.Parser(a, MyErrorType, TokenType, Context)
// now your parsers can use the alias like so:
fn expression() -> Parser(Expression) {
...
}
pub opaque type Parser(a, e, tok, ctx)
Functions
pub fn backtrackable(
parser: Parser(a, b, c, d),
) -> Parser(a, b, c, d)
By default, parsers will not backtrack if they fail after consuming at least
one token. Passing a parser to backtrackable
will change this behaviour and
allows us to jump back to the state of the parser before it consumed any input
and try another one.
This is most useful when you want to quickly try a few different parsers using
one_of
.
🚨 Backtracking parsers can drastically reduce performance, so you should avoid them where possible. A common reason folks reach for backtracking is when they want to try multiple branches that start with the same token or same sequence of tokens.
To avoid backtracking in these cases, you can create an intermediate parser
that consumes the common tokens and then use one_of
to try
the different branches.
pub fn do(
parser: Parser(a, b, c, d),
f: fn(a) -> Parser(e, b, c, d),
) -> Parser(e, b, c, d)
Do the first parser
, then apply f
to its result, which returns another
parser and is subsequently run. If the first parser fails, f
is not called.
Because this parser is so common, we normally import it unqualified.
This is very useful for running parsers in sequence with Gleam’s use
syntax.
import chomp.{do, return}
fn point() {
use _ <- do(chomp.token(LParen))
use x <- do(int_parser())
use _ <- do(chomp.token(Comma))
use y <- do(int_parser())
use _ <- do(chomp.token(RParen))
return(Point(x, y))
}
(See the main page for a complete example)
pub fn do_in(
context: a,
parser: Parser(b, c, d, a),
f: fn(b) -> Parser(e, c, d, a),
) -> Parser(e, c, d, a)
pub fn end() -> Parser(Nil, a, b, c)
Parse successfully only if at the end of the token stream.
pub fn fail(error: a) -> Parser(b, a, c, d)
Create a parser that consumes no tokens and always fails with the given error.
pub fn get_pos() -> Parser(Span, a, b, c)
A parser that returns the current token position.
pub fn in(
parser: Parser(a, b, c, d),
context: d,
) -> Parser(a, b, c, d)
Run a parser in a certain context. This allows you to include useful
information—context—in error messages. For example, instead of a message
such as “I expected a value” you could say “I expected a value inside of this
list in the function foo
”. The latter is far more user-friendly!
Context also holds on to the position of the parser where it was entered.
So if you had an InList
context entered right after a [
token, any errors
encountered inside the list can reference where the list started.
I found a problem in this list:
| the list starts here
v
[a, b, ]
~
|
I wanted a value
pub fn inspect(
parser: Parser(a, b, c, d),
message: String,
) -> Parser(a, b, c, d)
Run the given parser and then inspect it’s state.
pub fn is_at_end() -> Parser(Bool, a, b, c)
Returns True
if there are no tokens left to consume.
pub fn lazy(
parser: fn() -> Parser(a, b, c, d),
) -> Parser(a, b, c, d)
Defer the creation of a parser until it is needed. This is often most useful when creating a parser that is recursive and is not a function.
pub fn many(
parser: Parser(a, b, c, d),
) -> Parser(List(a), b, c, d)
Parse zero or more of the given parser.
pub fn many1(
parser: Parser(a, b, c, d),
) -> Parser(List(a), b, c, d)
Parse one or more of the given parser.
💡 If this parser succeeds, the list produced is guaranteed to be non-empty.
Feel free to let assert
the result!
pub fn map(
parser: Parser(a, b, c, d),
f: fn(a) -> e,
) -> Parser(e, b, c, d)
Run parser
, applying f
to its result.
pub fn map_error(
parser: Parser(a, b, c, d),
map: fn(Error(b, c)) -> Error(b, c),
) -> Parser(a, b, c, d)
Run a parser and if it fails and did not consume any tokens, apply the given function to the error and return the new error. This function provides a way to add more information to error messages. For example, if you just parsed some indentation in an indentation-sensitive language and now want to parse a statement, you could map the statement parser’s error to add extra info:
fn parse_statement() {
// parse a statement
|> chomp.or_error("I wanted a statement")
}
fn parse_something() {
use _ <- do(parse_indentation())
use statement <- do(
parse_statement() |> chomp.map_error(because_of_indent)
)
// ...
}
fn because_of_indent(error) {
case error {
chomp.Custom(error) ->
chomp.Custom("Since there was an indent, " <> error)
_ -> error
}
}
Now if parsing the statement in parse_something
fails, the error message
will be Since there was an indent, I wanted a statement
. Sweet!
Note that, like or_error
, EndOfInput
errors are not changed to minimize
confusing error messages.
pub fn one_of(
parsers: List(Parser(a, b, c, d)),
) -> Parser(a, b, c, d)
Try each parser in order until one succeeds. If none succeed, the last parser’s
error is returned. It is recommended to use a custom error with
or_error
for better error messages.
pub fn optional(
parser: Parser(a, b, c, d),
) -> Parser(Option(a), b, c, d)
Try the given parser, but if it fails return
None
instead
of failing.
pub fn or(
parser: Parser(a, b, c, d),
default: a,
) -> Parser(a, b, c, d)
Try the given parser, but if it fails return the given default value instead of failing.
pub fn or_error(
parser: Parser(a, b, c, d),
error: b,
) -> Parser(a, b, c, d)
Run a parser and if it fails and did not consume any tokens, return the given
error. This function is extremely useful for providing custom error messages
to parsers such as one_of
, take_map
, and take_if
.
Note that EndOfInput
errors are not replaced to minimize confusing error
messages.
import chomp
fn value() {
chomp.one_of([
string(),
number(),
variable(),
list(),
group(),
])
|> chomp.or_error("I expected a value (string, number, variable, or list)")
}
pub fn replace(
parser: Parser(a, b, c, d),
with b: e,
) -> Parser(e, b, c, d)
Run parser
, replacing its result with b
.
pub fn replace_error(
parser: Parser(a, b, c, d),
error: b,
) -> Parser(a, b, c, d)
Run a parser and if it fails, return the given error. This parser is similar
to or_error
, but it’s not quite the same—this one does not
care whether the parser consumed any tokens when it failed. That means that
if you have a parser like this:
chomp.one_of([function(), constant()])
|> chomp.replace_error("I expected a function or constant")
You will always get the error message “I expected a function or constant” even
if any of the one_of
parsers consumed tokens—a behavior that probably isn’t
what you want.
For instance, if this was parsed:
function 23() {}
// ^^ this is the error
We would get the error message “I expected a function or constant” because the
function
parser’s error was swallowed by replace_error
. Contrast that to
or_error
, where we would get something much better such as “I expected an
identifier”.
Of course there are cases where replace_error
can be helpful; just be
aware that you’ll often want to reach for or_error
first.
pub fn return(value: a) -> Parser(a, b, c, d)
The simplest kind of parser. return
consumes no tokens and always
produces the given value. Sometimes called succeed
instead.
This function might seem useless at first, but it is very useful when used in
combination with do
or then
.
import chomp.{do, return}
fn unit8_parser() {
use int <- do(int_parser())
case int >= 0, int <= 255 {
True, True ->
return(int)
False, _ ->
throw("Expected an int >= 0")
_, False ->
throw("Expected an int <= 255")
}
}
💡 return
and succeed
are names for the same thing.
We suggesting using return
unqualified when using do
and Gleam’s use
syntax, and chomp.succeed
in a pipeline with chomp.then
.
pub fn run(
src: List(Token(a)),
parser: Parser(b, c, a, d),
) -> Result(b, #(Error(c, a), Span, List(#(Span, d))))
pub fn sequence(
parser: Parser(a, b, c, d),
separator sep: Parser(e, b, c, d),
) -> Parser(List(a), b, c, d)
Parse a sequence separated by the separator
parser.
pub fn succeed(value: a) -> Parser(a, b, c, d)
pub fn take_if(predicate: fn(a) -> Bool) -> Parser(a, b, a, c)
Take and return a token if it satisfies the predicate.
pub fn take_map(f: fn(a) -> Option(b)) -> Parser(b, c, a, d)
Take the next token and attempt to transform it with the given function. This
is useful when creating reusable primtive parsers for your own tokens such as
take_identifier
or take_number
.
pub fn then(
parser: Parser(a, b, c, d),
f: fn(a) -> Parser(e, b, c, d),
) -> Parser(e, b, c, d)
Another name for do
.
pub fn throw(error: a) -> Parser(b, a, c, d)
pub fn token(tok: a) -> Parser(Span, b, a, c)
Parse a token of a particular type, returning its position.
use start_pos <- do(chomp.token(LParen))
// ...
use end_pos <- do(chomp.token(RParen))
pub fn until(
parser: Parser(a, b, c, d),
tok: c,
) -> Parser(List(a), b, c, d)
Parse until the given token is encountered, returning a list of the results.
pub fn until_end(
parser: Parser(a, b, c, d),
) -> Parser(List(a), b, c, d)
Parse until the end of the token stream, returning a list of the results.
💡 This parser produces better error messages than using both many
and end
in order, so you’ll often want to use this function instead.
A common use-case is when you’re parsing top-level statements or expressions
in a programming language until the end of the file.
use statements <- do(chomp.until_end(statement()))
return(statements)
// is generally better than
use statements <- do(chomp.many(statement()))
use _ <- do(chomp.end())
return(statements)