chomp

Types

The Error type represents all of the ways that a parser can fail. It has two type parameters, e and tok. See the Parser type for more information about them.

pub type Error(e, tok) {
  Custom(e)
  EndOfInput
  Expected(tok, got: tok)
  Unexpected(tok)
  BadParser(String)
}

Constructors

  • Custom(e)

    A custom error.

  • EndOfInput

    There are no more tokens to consume, but the parser required some.

  • Expected(tok, got: tok)

    The parser expected a certain token but got a different one instead.

  • Unexpected(tok)

    The parser encountered an unexpected token. This error is not very specific, so it’s often best to replace it using the or_error function when possible.

  • BadParser(String)

    A parser was called with incorrect input.

This type is very similar to list.ContinueOrStop.

pub type Loop(a, state) {
  Continue(state)
  Break(a)
}

Constructors

  • Continue(state)

    Continue parsing with the new state.

  • Break(a)

    Stop parsing and return a result.

The Parser type has four parameters; let’s take a look at each of them:

Parser(a, e, tok, ctx)
  1. a is the type of value that the parser knows how to produce. If you were writing a parser for a programming language, this might be your expression type.

  2. e is the type of custom error that you can choose to throw. This could be a String for simple errors such as “I expected an expression” or something more complex.

  3. tok is the type of tokens that the parser knows how to consume. You can take a look at the Token type for a bit more info, but note that it’s not necessary for the token stream to come from chomp’s lexer.

  4. ctx is used to make error reporting nicer. You can place a parser into a custom context. When the parser runs the context gets pushed into a stack. If the parser fails you can see the context stack in the error message, which can make error reporting and debugging much easier! See the in function for more details.

It can get a bit repetitive to write out the Parser type with all of the parameters in type annotations, so it’s common to make a type alias for it in your parser.

import chomp

type Parser(a) = chomp.Parser(a, MyErrorType, TokenType, Context)

// now your parsers can use the alias like so:
fn expression() -> Parser(Expression) {
  ...
}
pub opaque type Parser(a, e, tok, ctx)

Functions

pub fn any() -> Parser(a, b, a, c)

Parse a single token of any type and return it.

pub fn backtrackable(
  parser: Parser(a, b, c, d),
) -> Parser(a, b, c, d)

By default, parsers will not backtrack if they fail after consuming at least one token. Passing a parser to backtrackable will change this behaviour and allows us to jump back to the state of the parser before it consumed any input and try another one.

This is most useful when you want to quickly try a few different parsers using one_of.

🚨 Backtracking parsers can drastically reduce performance, so you should avoid them where possible. A common reason folks reach for backtracking is when they want to try multiple branches that start with the same token or same sequence of tokens.

To avoid backtracking in these cases, you can create an intermediate parser that consumes the common tokens and then use one_of to try the different branches.

pub fn do(
  parser: Parser(a, b, c, d),
  f: fn(a) -> Parser(e, b, c, d),
) -> Parser(e, b, c, d)

Do the first parser, then apply f to its result, which returns another parser and is subsequently run. If the first parser fails, f is not called. Because this parser is so common, we normally import it unqualified.

This is very useful for running parsers in sequence with Gleam’s use syntax.

import chomp.{do, return}

fn point() {
  use _ <- do(chomp.token(LParen))
  use x <- do(int_parser())
  use _ <- do(chomp.token(Comma))
  use y <- do(int_parser())
  use _ <- do(chomp.token(RParen))

  return(Point(x, y))
}

(See the main page for a complete example)

pub fn do_in(
  context: a,
  parser: Parser(b, c, d, a),
  f: fn(b) -> Parser(e, c, d, a),
) -> Parser(e, c, d, a)

A combination of do and in.

pub fn end() -> Parser(Nil, a, b, c)

Parse successfully only if at the end of the token stream.

pub fn fail(error: a) -> Parser(b, a, c, d)

Create a parser that consumes no tokens and always fails with the given error.

pub fn get_pos() -> Parser(Span, a, b, c)

A parser that returns the current token position.

pub fn in(
  parser: Parser(a, b, c, d),
  context: d,
) -> Parser(a, b, c, d)

Run a parser in a certain context. This allows you to include useful information—context—in error messages. For example, instead of a message such as “I expected a value” you could say “I expected a value inside of this list in the function foo”. The latter is far more user-friendly!

Context also holds on to the position of the parser where it was entered. So if you had an InList context entered right after a [ token, any errors encountered inside the list can reference where the list started.

I found a problem in this list:

| the list starts here
v
[a, b, ]
       ~
       |
I wanted a value
pub fn inspect(
  parser: Parser(a, b, c, d),
  message: String,
) -> Parser(a, b, c, d)

Run the given parser and then inspect it’s state.

pub fn is_at_end() -> Parser(Bool, a, b, c)

Returns True if there are no tokens left to consume.

pub fn lazy(
  parser: fn() -> Parser(a, b, c, d),
) -> Parser(a, b, c, d)

Defer the creation of a parser until it is needed. This is often most useful when creating a parser that is recursive and is not a function.

pub fn loop(
  init: a,
  step: fn(a) -> Parser(Loop(b, a), c, d, e),
) -> Parser(b, c, d, e)
pub fn many(
  parser: Parser(a, b, c, d),
) -> Parser(List(a), b, c, d)

Parse zero or more of the given parser.

pub fn many1(
  parser: Parser(a, b, c, d),
) -> Parser(List(a), b, c, d)

Parse one or more of the given parser.

💡 If this parser succeeds, the list produced is guaranteed to be non-empty. Feel free to let assert the result!

pub fn map(
  parser: Parser(a, b, c, d),
  f: fn(a) -> e,
) -> Parser(e, b, c, d)

Run parser, applying f to its result.

pub fn map_error(
  parser: Parser(a, b, c, d),
  map: fn(Error(b, c)) -> Error(b, c),
) -> Parser(a, b, c, d)

Run a parser and if it fails and did not consume any tokens, apply the given function to the error and return the new error. This function provides a way to add more information to error messages. For example, if you just parsed some indentation in an indentation-sensitive language and now want to parse a statement, you could map the statement parser’s error to add extra info:

fn parse_statement() {
  // parse a statement
  |> chomp.or_error("I wanted a statement")
}

fn parse_something() {
  use _ <- do(parse_indentation())
  use statement <- do(
    parse_statement() |> chomp.map_error(because_of_indent)
  )
  // ...
}

fn because_of_indent(error) {
  case error {
    chomp.Custom(error) ->
      chomp.Custom("Since there was an indent, " <> error)
    _ -> error
  }
}

Now if parsing the statement in parse_something fails, the error message will be Since there was an indent, I wanted a statement. Sweet!

Note that, like or_error, EndOfInput errors are not changed to minimize confusing error messages.

pub fn one_of(
  parsers: List(Parser(a, b, c, d)),
) -> Parser(a, b, c, d)

Try each parser in order until one succeeds. If none succeed, the last parser’s error is returned. It is recommended to use a custom error with or_error for better error messages.

pub fn optional(
  parser: Parser(a, b, c, d),
) -> Parser(Option(a), b, c, d)

Try the given parser, but if it fails return None instead of failing.

pub fn or(
  parser: Parser(a, b, c, d),
  default: a,
) -> Parser(a, b, c, d)

Try the given parser, but if it fails return the given default value instead of failing.

pub fn or_error(
  parser: Parser(a, b, c, d),
  error: b,
) -> Parser(a, b, c, d)

Run a parser and if it fails and did not consume any tokens, return the given error. This function is extremely useful for providing custom error messages to parsers such as one_of, take_map, and take_if.

Note that EndOfInput errors are not replaced to minimize confusing error messages.

import chomp

fn value() {
  chomp.one_of([
    string(),
    number(),
    variable(),
    list(),
    group(),
  ])
  |> chomp.or_error("I expected a value (string, number, variable, or list)")
}
pub fn replace(
  parser: Parser(a, b, c, d),
  with b: e,
) -> Parser(e, b, c, d)

Run parser, replacing its result with b.

pub fn replace_error(
  parser: Parser(a, b, c, d),
  error: b,
) -> Parser(a, b, c, d)

Run a parser and if it fails, return the given error. This parser is similar to or_error, but it’s not quite the same—this one does not care whether the parser consumed any tokens when it failed. That means that if you have a parser like this:

chomp.one_of([function(), constant()])
|> chomp.replace_error("I expected a function or constant")

You will always get the error message “I expected a function or constant” even if any of the one_of parsers consumed tokens—a behavior that probably isn’t what you want.

For instance, if this was parsed:

function 23() {}
//       ^^ this is the error

We would get the error message “I expected a function or constant” because the function parser’s error was swallowed by replace_error. Contrast that to or_error, where we would get something much better such as “I expected an identifier”.

Of course there are cases where replace_error can be helpful; just be aware that you’ll often want to reach for or_error first.

pub fn return(value: a) -> Parser(a, b, c, d)

The simplest kind of parser. return consumes no tokens and always produces the given value. Sometimes called succeed instead.

This function might seem useless at first, but it is very useful when used in combination with do or then.

import chomp.{do, return}

fn unit8_parser() {
  use int <- do(int_parser())

  case int >= 0, int <= 255 {
    True, True ->
      return(int)

    False, _ ->
      throw("Expected an int >= 0")

    _, False ->
      throw("Expected an int <= 255")
 }
}

💡 return and succeed are names for the same thing. We suggesting using return unqualified when using do and Gleam’s use syntax, and chomp.succeed in a pipeline with chomp.then.

pub fn run(
  src: List(Token(a)),
  parser: Parser(b, c, a, d),
) -> Result(b, #(Error(c, a), Span, List(#(Span, d))))

Parsers don’t do anything until they’re run! The run function takes a Parser and a list of Tokens and runs it; returning either the parsed value or a tuple of the Error with the position where the parser failed and the final context stack.

pub fn sequence(
  parser: Parser(a, b, c, d),
  separator sep: Parser(e, b, c, d),
) -> Parser(List(a), b, c, d)

Parse a sequence separated by the separator parser.

pub fn succeed(value: a) -> Parser(a, b, c, d)

💡 succeed and return are names for the same thing. We suggest using succeed in a pipeline with chomp.then, and return unqalified when using do with Gleam’s use syntax.

pub fn take_if(predicate: fn(a) -> Bool) -> Parser(a, b, a, c)

Take and return a token if it satisfies the predicate.

pub fn take_map(f: fn(a) -> Option(b)) -> Parser(b, c, a, d)

Take the next token and attempt to transform it with the given function. This is useful when creating reusable primtive parsers for your own tokens such as take_identifier or take_number.

pub fn then(
  parser: Parser(a, b, c, d),
  f: fn(a) -> Parser(e, b, c, d),
) -> Parser(e, b, c, d)

Another name for do.

pub fn throw(error: a) -> Parser(b, a, c, d)

The opposite of return, this parser always fails with the given error. Sometimes called fail instead.

pub fn token(tok: a) -> Parser(Span, b, a, c)

Parse a token of a particular type, returning its position.

use start_pos <- do(chomp.token(LParen))
// ...
use end_pos <- do(chomp.token(RParen))
pub fn until(
  parser: Parser(a, b, c, d),
  tok: c,
) -> Parser(List(a), b, c, d)

Parse until the given token is encountered, returning a list of the results.

pub fn until_end(
  parser: Parser(a, b, c, d),
) -> Parser(List(a), b, c, d)

Parse until the end of the token stream, returning a list of the results.

💡 This parser produces better error messages than using both many and end in order, so you’ll often want to use this function instead. A common use-case is when you’re parsing top-level statements or expressions in a programming language until the end of the file.

use statements <- do(chomp.until_end(statement()))
return(statements)

// is generally better than

use statements <- do(chomp.many(statement()))
use _ <- do(chomp.end())
return(statements)
Search Document