Getting Started

This guide will help you getting familiar with Pickle by going through multiple challenges of increasing complexity.

Parsing Simple Structures

Let’s assume we’re dealing with a simple format that represents a point with two axes that we receive as (x,y).

To get started, let’s first define a type for this.

type Point {
  Point(x: Int, y: Int)
}

The format we receive is really simple. We have an opening bracket, some integer value that represents x, a comma, some integer value that represents y, and a closing bracket.

Let’s take a look at how a parser for this can look like.

import pickle.{type Parser}

// ...

fn point_parser() -> Parser(Point, Point, Nil) {
  pickle.string("(", pickle.drop)
  |> pickle.then(pickle.integer(fn(point, x) { Point(..point, x: x) }))
  |> pickle.then(pickle.string(",", pickle.drop))
  |> pickle.then(pickle.integer(fn(point, y) { Point(..point, y: y) }))
  |> pickle.then(pickle.string(")", pickle.drop))
}

This is how parsers in Pickle look like, no matter how complex they are.

The Parser type has three type parameters. The first one is the type of the value we start with, the second one the type of the value we end up with, and the third one is our custom error type.

We start with pickle/string to parse a specific string, in this case the opening bracket. Since we don’t need it eventually, we drop it via pickle/drop, which is a mapper provided by Pickle to drop the parsed value.

We then (no pun intended) use pickle/then to combine two parsers. You’ll be using this a lot when using Pickle and for brevity reasons, pickle/then won’t be mentioned anymore from this point on.

The prior parser is combined with pickle/integer to parse a decimal integer, our x value, which we use to create a new point with our acquired x value. pickle/integer parses the given tokens as long as they can be represented as a decimal integer, so it doesn’t expect an integer of a specific length.

If you need or want to parse an integer of a different base, you can take a look at Pickle’s module documentation. In this guide we’ll only be using decimal integers, but feel free to play around.

We then continue our adventure with pickle/string to parse and drop the comma.

Afterwards we need to parse the y value of our point and as you might have guessed, use pickle/integer for this again.

Lastly, we hit the jackpot by using pickle/string to parse and drop the closing bracket.

Well done! Now we have a basic parser that we can use to parse points.

To apply the parser we use pickle/parse.

import gleam/io
import gleam/string
import pickle.{type Parser}

/// ...

fn new_point() -> Point {
  Point(0, 0)
}

pub fn main() {
  let assert Ok(point) =
    pickle.parse("(20,10)", new_point(), point_parser())

  string.inspect(point) |> io.print() // prints "Point(20, 10)"
}

pickle/parse takes three arguments. The first one is the input string, the second one the initial value, and the third one the parser to apply. The initial value doesn’t have to be a simple data structure. When parsing DSLs or even programming languages you most probably want to use a custom AST type to initialize the parser value with.

Parsing Variants

Let’s add some seasoning to our problem domain here. Our point can now come in different shapes, (x,y) and [x,y].

Before you write two parsers with a lot of duplication for each shape, take a look at the following parsers and compare it to the prior one we’ve written.

import gleam/io
import gleam/string
import pickle.{type Parser}

// ...

fn do_point_parser(
  opening_bracket: String,
  closing_bracket: String,
) -> Parser(Point, Point, Nil) {
  pickle.string(opening_bracket, pickle.drop)
  |> pickle.then(pickle.integer(fn(point, x) { Point(..point, x: x) }))
  |> pickle.then(pickle.string(",", pickle.drop))
  |> pickle.then(pickle.integer(fn(point, y) { Point(..point, y: y) }))
  |> pickle.then(pickle.string(closing_bracket, pickle.drop))
}

fn point_parser() -> Parser(Point, Point, Nil) {
  pickle.one_of([do_point_parser("(", ")"), do_point_parser("[", "]")])
}

We handle both kinds of brackets by using pickle/one_of, which takes zero to n parsers to try in order, and in this case we feed it with two parsers by using our parameterized do_point_parser function, enabling us to specify different opening and closing brackets.

Let’s see it in action.

import gleam/io
import gleam/string
import pickle.{type Parser}

/// ...

pub fn main() {
  let assert Ok(first_point) =
    pickle.parse("(20,-5)", new_point(), point_parser())

  let assert Ok(second_point) =
    pickle.parse("[10,325]", new_point(), point_parser())

  string.inspect(first_point) |> io.print() // prints "Point(20, -5)"
  string.inspect(second_point) |> io.print() // prints "Point(10, 325)"
}

Perform Validation

Pickle offers the possibility to validate the value of the parser. To showcase this, a new requirement has been delivered by our fellow UPS driver.

The x and y values of our point cannot be less than -10 and greater than 10. Why? Why not.

Let’s first define a custom error type for our validation purposes to reflect this requirement.

type PointAxis {
  X
  Y
}

type PointError {
  ValueIsLessThanMinusTen(axis: PointAxis)
  ValueIsGreaterThanTen(axis: PointAxis)
}

We then need to replace the third type parameter of Parser with our custom error type to tell Pickle what error type to expect in case of a validation failure.

import gleam/io
import gleam/string
import pickle.{type Parser}

// ...

fn do_point_parser(
  opening_bracket: String,
  closing_bracket: String,
) -> Parser(Point, Point, PointError) {
  pickle.string(opening_bracket, pickle.drop)
  |> pickle.then(pickle.integer(fn(point, x) { Point(..point, x: x) }))
  |> pickle.then(pickle.string(",", pickle.drop))
  |> pickle.then(pickle.integer(fn(point, y) { Point(..point, y: y) }))
  |> pickle.then(pickle.string(closing_bracket, pickle.drop))
}

fn point_parser() -> Parser(Point, Point, PointError) {
  pickle.one_of([do_point_parser("(", ")"), do_point_parser("[", "]")])
}

Now we need to add some validation. For this purpose we use pickle/guard.

import gleam/io
import gleam/string
import pickle.{type Parser}

// ...

fn validate_x_value() -> Parser(Point, Point, PointError) {
  pickle.guard(fn(point: Point) { point.x >= -10 }, ValueIsLessThanMinusTen(X))
  |> pickle.then(pickle.guard(
    fn(point: Point) { point.x <= 10 },
    ValueIsGreaterThanTen(X),
  ))
}

fn validate_y_value() -> Parser(Point, Point, PointError) {
  pickle.guard(fn(point: Point) { point.y >= -10 }, ValueIsLessThanMinusTen(Y))
  |> pickle.then(pickle.guard(
    fn(point: Point) { point.y <= 10 },
    ValueIsGreaterThanTen(Y),
  ))
}

fn do_point_parser(
  opening_bracket: String,
  closing_bracket: String,
) -> Parser(Point, Point, PointError) {
  pickle.string(opening_bracket, pickle.drop)
  |> pickle.then(pickle.integer(fn(point, x) { Point(..point, x: x) }))
  |> pickle.then(pickle.string(",", pickle.drop))
  |> pickle.then(pickle.integer(fn(point, y) { Point(..point, y: y) }))
  |> pickle.then(pickle.string(closing_bracket, pickle.drop))
}

fn point_parser() -> Parser(Point, Point, PointError) {
  pickle.one_of([do_point_parser("(", ")"), do_point_parser("[", "]")])
  |> pickle.then(validate_x_value())
  |> pickle.then(validate_y_value())
}

We could certainly reduce the duplication in this validation logic and replace these magic numbers with constants, but that’s not the focus here.

From now on we’ve got a parser with validation logic to ensure our points cannot have x and y values that are less than -10 and greater than 10.

Trying to parse a point with invalid values now results in a GuardError, which contains our error value.

import gleam/io
import gleam/string
import pickle.{type Parser, GuardError}

/// ...

pub fn main() {
  let assert Ok(point) =
    pickle.parse("(20,-5)", new_point(), point_parser())

  let assert Error(GuardError(error)) =
    pickle.parse("[10,325]", new_point(), point_parser())

  string.inspect(first_point) |> io.print() // prints "Point(20, -5)"
  string.inspect(error) |> io.print() // prints "ValueIsGreaterThanTen(Y)"
}

Pickle not only returns errors when some validation failed, but also when some of the parsers failed to parse the input. You should keep in mind that the GuardError type is exclusive to validation-specific failures.

Parsing Sequences

Fine, we’re able to parse a point, but what about a list of points?

Let’s assume the receive a list of points as a comma-separated list (e.g., (2,-4),(10,0),[-5,6]).

Pickle happens to offer just the right tool for this job, pickle/many. This parser applies the given parser zero to n times until it fails and is offering us a way to accumulate the collected points.

import gleam/io
import gleam/list
import gleam/string
import pickle.{type Parser}

// ...

fn points_parser() -> Parser(List(Point), List(Point), PointError) {
  pickle.many(
    new_point(),
    point_parser()
      |> pickle.then(
        pickle.one_of([pickle.string(",", pickle.drop), pickle.eof()]),
      ),
    list.prepend,
  )
}

Here we use our point_parser function combined with a parser to either parse a comma or EOF to set the head of the parser to the next point. pickle/many runs our parser zero to n times until it fails. Each parser will be given a blank point as an initial value. Afterwards we prepend the parsed point to our list of points via gleam/list/prepend.

Keep in mind that pickle/many never fails and adheres to the best-effort error handling strategy. As soon as it encounters invalid input it just stops consuming any more tokens and returns the collected items that could be parsed until the point of failure.

This means that you could end up with no collected points (an empty list) because you provided invalid input to the parser.

If the given parser fails due to some validation constraint, moving this validation logic outside of pickle/many might be a viable option, so you’re still able to convey validation issues to the consumer while letting pickle/many collect items with an invalid state before running the validation. The best approach depends on your use case eventually.

import gleam/io
import gleam/list
import gleam/string
import pickle.{type Parser}

/// ...

pub fn main() {
  let assert Ok(points) =
    pickle.parse("(20,-5),[0,10]", [], points_parser())

  let assert Ok(points2) =
    pickle.parse("(20,-5),gibberish", [], points_parser())

  let assert Ok(nothing) =
    pickle.parse("[50,-50],(-100,25)", [], points_parser())

  let assert Ok(nothing2) =
    pickle.parse("gibberish", [], points_parser())

  string.inspect(points) |> io.print() // prints "[Point(0, 10), Point(20, -5)]"
  string.inspect(points2) |> io.print() // prints "[Point(20, -5)]"
  string.inspect(nothing) |> io.print() // prints "[]"
  string.inspect(nothing2) |> io.print() // prints "[]"
}

You’ve Made It!

Congratulations! You’ve finished the getting started guide and learned about the fundamentals of Pickle. Happy parsing!

The tested final implementation of this parser can be found in test/examples/point_test.gleam.

Additional Challenges

You could think about adding further shapes like {x,y}, or add support for another delimiter like a semicolon.

One thing to keep in mind is that Pickle is scannerless, thus there’s no separate lexer to tokenize the input. This means that the parser covers responsibilities usually taken care of by a lexer like handling whitespace. As of now, our point parser cannot handle input with whitespace sprinkled in.

The parser will fail if we provide input like (x, y), or ( x,y ).

You could extend the parser to handle whitespace, in this case by ignoring it. For this you can use pickle/skip_whitespace.

Search Document