NimbleParsec v0.1.0 NimbleParsec View Source
NimbleParsec is a simple and fast library for parser combinators.
Combinators are built during runtime and compile into multiple clauses with binary matching. This provides the following benefits:
Performance: since it compiles to binary matching, it leverages many Erlang VM optimizations to generate extremely fast parser code with low memory usage
Composable: this library does not rely on macros for building and composing parsers, therefore they are fully composable. The only macros are
defparsec/3anddefparsecp/3which emit the compiled clauses with binary matchingNo runtime dependency: after compile, the generated parser clauses have no runtime dependency on
NimbleParsec. This opens up the possibility to compile parsers and do not impose a dependency on users of your libraryNo footprint:
NimbleParseconly needs to be imported in your modules. Leaving no footprint on your modules.
Examples
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(literal("-"))
|> integer(2)
|> ignore(literal("-"))
|> integer(2)
time =
integer(2)
|> ignore(literal(":"))
|> integer(2)
|> ignore(literal(":"))
|> integer(2)
|> optional(literal("Z"))
defparsec :datetime, date |> ignore(literal("T")) |> concat(time), debug: true
end
MyParser.datetime("2010-04-17T14:12:34Z")
#=> {:ok, [2010, 4, 17, 14, 12, 34, "Z"], "", 1, 21}
If you add debug: true to defparsec/3, it will print the generated
clauses, which are shown below:
defp datetime__0(<<x0, x1, x2, x3, "-", x4, x5, "-", x6, x7, "T",
x8, x9, ":", x10, x11, ":", x12, x13, rest::binary>>,
acc, stack, combinator__line, combinator__column)
when x0 >= 48 and x0 <= 57 and (x1 >= 48 and x1 <= 57) and
(x2 >= 48 and x2 <= 57) and (x3 >= 48 and x3 <= 57) and
(x4 >= 48 and x4 <= 57) and (x5 >= 48 and x5 <= 57) and
(x6 >= 48 and x6 <= 57) and (x7 >= 48 and x7 <= 57) and
(x8 >= 48 and x8 <= 57) and (x9 >= 48 and x9 <= 57) and
(x10 >= 48 and x10 <= 57) and (x11 >= 48 and x11 <= 57) and
(x12 >= 48 and x12 <= 57) and (x13 >= 48 and x13 <= 57) do
datetime__1(
rest,
[(x13 - 48) * 1 + (x12 - 48) * 10, (x11 - 48) * 1 + (x10 - 48) * 10,
(x9 - 48) * 1 + (x8 - 48) * 10, (x7 - 48) * 1 + (x6 - 48) * 10, (x5 - 48) * 1 + (x4 - 48) * 10,
(x3 - 48) * 1 + (x2 - 48) * 10 + (x1 - 48) * 100 + (x0 - 48) * 1000] ++ acc,
stack,
combinator__line,
combinator__column + 19
)
end
defp datetime__0(rest, acc, stack, line, column) do
{:error, "...", rest, line, column}
end
defp datetime__1(<<"Z", rest::binary>>, acc, stack, combinator__line, combinator__column) do
datetime__2(rest, ["Z"] ++ acc, stack, combinator__line, combinator__column + 1)
end
defp datetime__1(rest, acc, stack, line, column) do
datetime__2(rest, acc, stack, line, column)
end
defp datetime__2(rest, acc, _stack, line, column) do
{:ok, acc, rest, line, column}
end
As you can see, it generates highly inlined code, comparable to
hand-written parsers. This gives NimbleParsec an order of magnitude
performance gains compared to other parser combinators. Further performance
can be gained by giving the inline: true option to defparsec/3.
Link to this section Summary
Functions
Defines a single ascii codepoint in the given ranges
Chooses one of the given combinators
Concatenates two combinators
Defines a public parser combinator with the given name and opts
Defines a private parser combinator
Duplicates the combinator to_duplicate n times
Returns an empty combinator
Ignores the output of combinator given in to_ignore
Defines an integer combinator with of exact length or min and max
length
Adds a label to the combinator to be used in error reports
Defines a literal binary value
Maps over the combinator results with the remote or local function in call
Marks the given combinator as optional
Invokes an already compiled parsec with name name in the
same module
Invokes call to emit the AST that will repeat to_repeat
while the AST code returns true
Invokes call to emit the AST that traverses the to_traverse
combinator results
Reduces over the combinator results with the remote or local function in call
Allow the combinator given on to_repeat to appear zero or more times
Repeats to_repeat until one of the combinators in choices match
Repeats while the given remote or local function call returns true
Replaces the output of combinator given in to_replace by a single value
Allow the combinator given on to_repeat to appear at least, at most
or exactly a given amout of times
Traverses the combinator results with the remote or local function call
Defines a single utf8 codepoint in the given ranges
Link to this section Types
min_and_max() :: {:min, pos_integer()} | {:max, pos_integer()}
Link to this section Functions
Defines a single ascii codepoint in the given ranges.
ranges is a list containing one of:
- a
min..maxrange expressing supported codepoints - a
codepointinteger expressing a supported codepoint {:not, min..max}expressing not supported codepoints{:not, codepoint}expressing a not supported codepoint
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_lowercase,
empty()
|> ascii_char([?0..?9])
|> ascii_char([?a..?z])
end
MyParser.digit_and_lowercase("1a")
#=> {:ok, [?1, ?a], "", 1, 3}
MyParser.digit_and_lowercase("a1")
#=> {:error, "expected a byte in the range ?0..?9, followed by a byte in the range ?a..?z", "a1", 1, 1}
Chooses one of the given combinators.
Expects at leasts two choices.
Beware! Char combinators
Note both utf8_char/2 and ascii_char/2 allow multiple ranges to
be given. Therefore, instead this:
choice([
ascii_char([?a..?z]),
ascii_char([?A..?Z]),
])
One should simply prefer:
ascii_char([?a..?z, ?A..?Z])
As the latter is compiled more efficiently by NimbleParser.
Beware! Always successful combinators
If a combinator that always succeeds is given as a choice, that choice
will always succeed which may lead to unused function warnings since
any further choice won’t ever be attempted. For example, because repeat/2
always succeeds, the literal/2 combinator below it won’t ever run:
choice([
repeat(ascii_char([?0..?9])),
literal("OK")
])
Instead of repeat/2, you may want to use times/3 with the flags :min
and :max.
Concatenates two combinators.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_upper_lower_plus,
concat(
concat(ascii_char([?0..?9]), ascii_char([?A..?Z])),
concat(ascii_char([?a..?z]), ascii_char([?+..?+]))
)
end
MyParser.digit_upper_lower_plus("1Az+")
#=> {:ok, [?1, ?A, ?z, ?+], "", 1, 5}
Defines a public parser combinator with the given name and opts.
Beware!
defparsec/3 is executed during compilation. This means you can’t
invoke a function defined in the same module. The following will error
because the date function has not yet been defined:
defmodule MyParser do
import NimbleParsec
def date do
integer(4)
|> ignore(literal("-"))
|> integer(2)
|> ignore(literal("-"))
|> integer(2)
end
defparsec :date, date()
end
This can be solved in different ways. You may define date in another
module and then invoke it. You can also store the parsec in a variable
or a module attribute and use that instead. For example:
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(literal("-"))
|> integer(2)
|> ignore(literal("-"))
|> integer(2)
defparsec :date, date
end
Options
:inline- when true, inlines clauses that work as redirection for other clauses. It is disabled by default because of a bug in Elixir v1.5 and v1.6 where unused functions that are inlined cause a compilation error:debug- when true, writes generated clauses to:stderrfor debugging
Defines a private parser combinator.
It cannot be invoked directly, only via parsec/2.
Receives the same options as defparsec/3.
duplicate(t(), t(), pos_integer()) :: t()
Duplicates the combinator to_duplicate n times.
Returns an empty combinator.
An empty combinator cannot be compiled on its own.
Ignores the output of combinator given in to_ignore.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :ignorable, literal("T") |> ignore() |> integer(2, 2)
end
MyParser.ignorable("T12")
#=> {:ok, [12], "", 1, 3}
integer(t(), pos_integer() | [min_and_max()]) :: t()
Defines an integer combinator with of exact length or min and max
length.
If you want an integer of unknown size, use integer(min: 1).
Examples
With exact length:
defmodule MyParser do
import NimbleParsec
defparsec :two_digits_integer, integer(2)
end
MyParser.two_digits_integer("123")
#=> {:ok, [12], "3", 1, 3}
MyParser.two_digits_integer("1a3")
#=> {:error, "expected a two digits integer", "1a3", 1, 1}
With min and max:
defmodule MyParser do
import NimbleParsec
defparsec :two_digits_integer, integer(min: 2, max: 4)
end
MyParser.two_digits_integer("123")
#=> {:ok, [12], "3", 1, 3}
MyParser.two_digits_integer("1a3")
#=> {:error, "expected a two digits integer", "1a3", 1, 1}
Adds a label to the combinator to be used in error reports.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_lowercase,
empty()
|> ascii_char([?0..?9])
|> ascii_char([?a..?z])
|> label("digit followed by lowercase letter")
end
MyParser.digit_and_lowercase("1a")
#=> {:ok, [?1, ?a], "", 1, 3}
MyParser.digit_and_lowercase("a1")
#=> {:error, "expected a digit followed by lowercase letter", "a1", 1, 1}
Defines a literal binary value.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :literal_t, literal("T")
end
MyParser.literal_t("T")
#=> {:ok, ["T"], "", 1, 2}
MyParser.literal_t("not T")
#=> {:error, "expected a literal \"T\"", "not T", 1, 1}
Maps over the combinator results with the remote or local function in call.
call is either a {module, function, args} representing
a remote call or {function, args} representing a local call.
Each parser result will be invoked individually for the call.
Each result be prepended to the given args. The args will
be injected at the compile site and therefore must be escapable
via Macro.escape/1.
See traverse/3 for a low level version of this function.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letters_to_string_chars,
ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> map({Integer, :to_string, []})
end
MyParser.letters_to_string_chars("abc")
#=> {:ok, ["97", "98", "99"], "", 1, 4}
Marks the given combinator as optional.
It is equivalent to choice([optional, empty()]).
Invokes an already compiled parsec with name name in the
same module.
It is useful for implementing recursive parsers.
It can also be used to exchange compilation time by runtime
performance. If you have a parser used over and over again,
you can compile it using defparsecp and rely on it via
this function. The tree size built at compile time will be
reduce although runtime performance is degraded as every time
this function is invoked it introduces a stacktrace entry.
Invokes call to emit the AST that will repeat to_repeat
while the AST code returns true.
call is a {module, function, args} where the AST argument
that represents the binary to be parsed will be prended to
args. call is invoked at compile time and is useful in
combinators that avoid injecting runtime dependencies.
Invokes call to emit the AST that traverses the to_traverse
combinator results.
call is a {module, function, args} where the AST argument
that will represent the combinator results will be prended to
args. call is invoked at compile time and is useful in
combinators that avoid injecting runtime dependencies.
Reduces over the combinator results with the remote or local function in call.
call is either a {module, function, args} representing
a remote call or {function, args} representing a local call.
The parser results to be reduced will be prepended to the
given args. The args will be injected at the compile site
and therefore must be escapable via Macro.escape/1.
See traverse/3 for a low level version of this function.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letters_to_reduced_chars,
ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> reduce({Enum, :join, ["-"]})
end
MyParser.letters_to_reduced_chars("abc")
#=> {:ok, ["97-98-99"], "", 1, 4}
Allow the combinator given on to_repeat to appear zero or more times.
Beware! Since repeat/2 allows zero entries, it cannot be used inside
choice/2, because it will always succeed and may lead to unused function
warnings since any further choice won’t ever be attempted. For example,
because repeat/2 always succeeds, the literal/2 combinator below it
won’t ever run:
choice([
repeat(ascii_char([?a..?z])),
literal("OK")
])
Instead of repeat/2, you may want to use times/3 with the flags :min
and :max.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :repeat_lower, repeat(ascii_char([?a..?z]))
end
MyParser.repeat_lower("abcd")
#=> {:ok, [?a, ?b, ?c, ?d], "", 1, 5}
MyParser.repeat_lower("1234")
#=> {:ok, [], "1234", 1, 1}
Repeats to_repeat until one of the combinators in choices match.
Each of the combinators given in choice must be optimizable into
a single pattern, otherwise this function will refuse to compile.
Use repeat_while/3 for a general mechanism for repeating.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :string,
ascii_char([?"])
|> repeat_until(
choice([
~S(\") |> literal() |> replace(?"),
utf8_char([])
]),
[ascii_char(?")]
)
|> ascii_char([?"])
|> reduce({List, :to_string, []})
defp not_quote(<<?", _::binary>>), do: false
defp not_quote(_), do: true
end
MyParser.string(~S("string with quotes \" inside"))
{:ok, ["\"string with quotes \" inside\""], "", 1, 31}
Repeats while the given remote or local function call returns true.
call is either a {module, function, args} representing
a remote call or {function, args} representing a local call.
The rest of the binary to be parsed will be prepended to the
given args. The args will be injected at the compile site
and therefore must be escapable via Macro.escape/1.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :string,
ascii_char([?"])
|> repeat_while(
choice([
~S(\") |> literal() |> replace(?"),
utf8_char([])
]),
{:not_quote, []}
)
|> ascii_char([?"])
|> reduce({List, :to_string, []})
defp not_quote(<<?", _::binary>>), do: false
defp not_quote(_), do: true
end
MyParser.string(~S("string with quotes \" inside"))
{:ok, ["\"string with quotes \" inside\""], "", 1, 31}
Replaces the output of combinator given in to_replace by a single value.
The value will be injected at the compile site
and therefore must be escapable via Macro.escape/1.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :replaceable, literal("T") |> replace("OTHER") |> integer(2, 2)
end
MyParser.replaceable("T12")
#=> {:ok, ["OTHER", 12], "", 1, 3}
times(t(), t(), pos_integer() | [min_and_max()]) :: t()
Allow the combinator given on to_repeat to appear at least, at most
or exactly a given amout of times.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :minimum_lower, times(ascii_char([?a..?z]), min: 2)
end
MyParser.minimum_lower("abcd")
#=> {:ok, [?a, ?b, ?c, ?d], "", 1, 5}
MyParser.minimum_lower("ab12")
#=> {:ok, [?a, ?b], "12", 1, 3}
MyParser.minimum_lower("a123")
#=> {:ok, [], "a123", 1, 1}
Traverses the combinator results with the remote or local function call.
call is either a {module, function, args} representing
a remote call or {function, args} representing a local call.
The parser results to be traversed will be prepended to the
given args. The args will be injected at the compile site
and therefore must be escapable via Macro.escape/1.
Notice the results are received in reverse order and must be returned in reverse order.
The number of elements returned does not need to be the same as the number of elements given.
This is a low-level function for changing the parsed result.
On top of this function, other functions are built, such as
map/3 if you want to map over each individual element and
not worry about ordering, reduce/3 to reduce all elements
into a single one, replace/3 if you want to replace the
parsed result by a single value and ignore/3 if you want to
ignore the parsed result.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letters_to_chars,
ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> traverse({:join_and_wrap, ["-"]})
defp join_and_wrap(args, joiner) do
args |> Enum.join(joiner) |> List.wrap()
end
end
MyParser.letters_to_chars("abc")
#=> {:ok, ["99-98-97"], "", 1, 4}
Defines a single utf8 codepoint in the given ranges.
ranges is a list containing one of:
- a
min..maxrange expressing supported codepoints - a
codepointinteger expressing a supported codepoint {:not, min..max}expressing not supported codepoints{:not, codepoint}expressing a not supported codepoint
Note: currently columns only count codepoints and not graphemes. This means the column count will be off when the input contains grapheme clusters.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_utf8,
empty()
|> utf8_char([?0..?9])
|> utf8_char([])
end
MyParser.digit_and_utf8("1é")
#=> {:ok, [?1, ?é], "", 1, 3}
MyParser.digit_and_utf8("a1")
#=> {:error, "expected a utf8 codepoint in the range ?0..?9, followed by a utf8 codepoint", "a1", 1, 1}