# Creating a Basic Parser

This tutorial will explore how Ergo works through creating a parser to match numbers, be they integers like -5 or 42, or decimals like 25.6. The parser will match them and convert them into the appropriate Elixir numeric value. We will build it up in stages, starting with bare digits.

## Parsing a digit

To begin with we need to be able to parse digits. One option for that is to use the basic `char`

parser to match digit characters. Using IEx here is what you would do:

```
alias Ergo
alias Ergo.Context
import Ergo.Terminals
digit = char(?0..?9)
Ergo.parse(digit, "42")
%Context{status: :ok, ast: 52}
```

Where the integer 52 is the character code of the digit '4' (Try typing `?4`

into the IEx console to see for yourself).

Now we can parse a single digit, how about multiple digits?

## Parsing many digits

To parse multiple digits we use the `many`

parser in conjunction with the `digit`

parser as follows:

```
import Ergo.Combinators
digits = many(digit())
Ergo.parse(digits, "42")
%Context{status: :ok, ast: [52, 50]}
```

In fact you might see `'42'`

in your version of the AST because IEx will try to render the list `[52, 50]`

as a charlist. This is a hangover from Erlang. If you would prefer to see the list add the following to your `~/.iex.exs`

file:

`IEx.configure(inspect: [charlists: :as_lists])`

The `many`

combinator parser repeatedly invokes the `digit`

parser to match as many digit characters as possible, generating an AST that is a list of those digits. To transform them into a numeric value we need to apply another function to the AST.

In our case the AST list contains character values of the digits matched, e.g. [52, 50] for the digits ['4', '2'] respectively. Since the digit character '0' has character value of 48 we can turn the characters into digit values by subtracting 48.

The heavy lifting of transforming digits is done by `c_transform`

below which is pipeline to transform ['4', '2'] -> [{4, 10}, {2, 1}] -> [40, 2] -> 42:

```
c_transform = fn ast ->
bases = Stream.unfold(1, fn n -> {n, n * 10} end)
digits = Enum.map(ast, fn digit -> digit - 48 end)
Enum.zip(Enum.reverse(digits), bases)
|> Enum.map(&Tuple.product/1)
|> Enum.sum
end
digits = many(digit) |> transform(c_transform)
Ergo.parse(digits, "42")
%Context{status: :ok, ast: 42}
```

In this case we are applying the `transform`

parser to the `many`

parser. Transform only operates on the AST of the parser it is given by applying a function to it. In this case we could also have used:

`digits = many(digit(), ast: c_transform)`

As many of the combinator parsers support an optional `ctx:`

or `ast:`

argument as a shortcut.

At this point we can parse positive integers of any length:

```
Ergo.parse(digits, "918212812783918723")
%Context{status: :ok, ast: 918212812783918723}
```

## Parsing negative numbers

What about negative values? We need to look for a leading '-' character however, unlike the digits, the minus is optional. Parsing the minus is simple enough:

```
minus = char(?-)
Ergo.parse(minus, "-")
%Context{status: :ok, ast: 45}
```

45 is the char value of the char '-'. We can now use the `optional`

combinator to allow a minus to be matched, or not:

```
minus = optional(char(?-))
Ergo.parse(minus, "-42")
%Context{status: :ok, ast: 45}
Ergo.parse(minus, "42")
%Context{status: :ok, ast: nil}
```

In the second case the status is `:ok`

meaning the optional parser succeeded, however the ast is `nil`

meaning nothing was matched. Let's make this a bit more useful:

```
minus = optional(char(?-)) |> transform(fn ast ->
case ast do
nil -> 1
45 -> -1
end
end)
Ergo.parse(minus, "-42")
%Context{status: :ok, ast: -1}
Ergo.parse(minus, "42")
%Context{status: :ok, ast: 1}
```

Now when `minus`

matches a '-' it will transform it to the value -1. When it doesn't match anything it will transform it to the value 1. Now let's combine it with the other parser.

```
integer = sequence([
minus,
digits
])
```

The `sequence`

parser tries to match a list of parser in turn and, if they all match, generates an AST composed of a list of each of their results. Let's see how it works:

```
Ergo.parse(integer, "1234")
%Context{status: :ok, ast: [1, 1234]}
Ergo.parse(integer, "-5678")
%Context{status: :ok, ast: [-1, 5678]}
```

So we can see that it's easy to get the right result by simply taking the product of the two values in the AST:

```
integer = sequence([
minus,
digits,
],
ast: &Enum.product/1
)
Ergo.parse(integer, "1234")
%Context{status: :ok, ast: 1234}
Ergo.parse(integer, "-5678")
%Context{status: :ok, ast: -5678}
```

So far so good. We can now parse positive and negative integers.

## Parsing decimals

If we want to parse decimal numbers as well we need to handle the (optional) mantissa, the digits to the right of the decimal point.

We can see that the mantissa is structurally the same as the integer part, a set of digits, but will need to be processed a little differently.

The `m_transform`

function below should look familiar. It works the same way as the `c_transform`

only instead of multiplying by increasing powers of 10, we're dividing by increasing powers of 10.

```
m_transform = fn ast ->
ast
|> Enum.map(fn digit -> digit - 48 end)
|> Enum.zip(Stream.unfold(0.1, fn n -> {n, n / 10} end))
|> Enum.map(&Tuple.product/1)
|> Enum.sum
end
mantissa = many(digit, ast: m_transform)
Ergo.parse(mantissa, "5")
%Context{status: :ok, ast: 0.5}
Ergo.parse(mantissa, "42")
%Context{status: :ok, ast: 0.42000000000000004}
```

There may be a precision issue with this code but you can see the principle it is operating by.

Now to join the two components together, assuming there is a decimal point (suggesting we'll need `optional`

again). Also we'll again make use of the `ast:`

feature of the `sequence`

combinator to process AST's to give us the right value.

```
number = sequence([
integer,
optional(
sequence([
ignore(char(?.)),
mantissa
], ast: &List.first/1)
)
], ast: &Enum.sum/1)
Ergo.parse(number, "42")
%Context{status: :ok, ast: 42}
Ergo.parse(number, "0.45")
%Context{status: :ok, ast: 0.45}
Ergo.parse(number, "-42")
%Context{status: :ok, ast: -42}
```

All looking good, just one more example:

```
Ergo.parse(number, "-4.2")
%Context{status: :ok, ast: -3.8}
```

Oops! There is a problem with our implementation in that we add together the integer and decimal parts. This works for positive numbers but in the latter case -4 + 0.2 = -3.8 not -4.2. When the integer part is negative we need to subtract the decimal part. We can no longer just use `Enum.sum`

to process the result of the top-level sequence. Instead:

```
combine = fn
[integer, decimal | []] ->
if integer >= 0 do
integer + decimal
else
integer - decimal
end
ast ->
Enum.sum(ast)
end
number = sequence([
integer,
optional(
sequence([
ignore(char(?.)),
mantissa
], ast: &List.first/1)
)
], ast: combine)
Ergo.parse(number, "-4.2")
%Context{status: :ok, ast: -4.2}
```

## Conclusion

Through a series of steps we have built a parser that can handle any kind of integer or decimal number we throw at it. We've seen the using of terminal parsers like `char`

as well as combinator parsers like `optional`

, `ignore`

, `many`

, and `sequence`

and meta parsers like `transform`

(and that often transform can be specified as a `ast:`

argument to a combinator parser).

Hopefully this guide will be helpful in thinking about how to build your own parsers.