NimbleParsec (NimbleParsec v1.1.0) View Source
NimbleParsec
is a simple and fast library for text-based parser
combinators.
Combinators are built during runtime and compiled into multiple clauses with binary matching. This provides the following benefits:
Performance: since it compiles to binary matching, it leverages many Erlang VM optimizations to generate extremely fast parser code with low memory usage
Composable: this library does not rely on macros for building and composing parsers, therefore they are fully composable. The only macros are
defparsec/3
anddefparsecp/3
which emit the compiled clauses with binary matchingNo runtime dependency: after compilation, the generated parser clauses have no runtime dependency on
NimbleParsec
. This opens up the possibility to compile parsers and do not impose a dependency on users of your libraryNo footprints:
NimbleParsec
only needs to be imported in your modules. There is no need foruse NimbleParsec
, leaving no footprints on your modules
The goal of this library is to focus on a set of primitives for writing efficient parser combinators. The composition aspect means you should be able to use those primitives to implement higher level combinators.
Note this library does not handle low-level binary parsing. In such cases, we recommend using Elixir's bitstring syntax.
Examples
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
time =
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
defparsec :datetime, date |> ignore(string("T")) |> concat(time), debug: true
end
MyParser.datetime("2010-04-17T14:12:34Z")
#=> {:ok, [2010, 4, 17, 14, 12, 34, "Z"], "", %{}, 1, 21}
If you add debug: true
to defparsec/3
, it will print the generated
clauses, which are shown below:
defp datetime__0(<<x0, x1, x2, x3, "-", x4, x5, "-", x6, x7, "T",
x8, x9, ":", x10, x11, ":", x12, x13, rest::binary>>,
acc, stack, comb__context, comb__line, comb__column)
when x0 >= 48 and x0 <= 57 and (x1 >= 48 and x1 <= 57) and
(x2 >= 48 and x2 <= 57) and (x3 >= 48 and x3 <= 57) and
(x4 >= 48 and x4 <= 57) and (x5 >= 48 and x5 <= 57) and
(x6 >= 48 and x6 <= 57) and (x7 >= 48 and x7 <= 57) and
(x8 >= 48 and x8 <= 57) and (x9 >= 48 and x9 <= 57) and
(x10 >= 48 and x10 <= 57) and (x11 >= 48 and x11 <= 57) and
(x12 >= 48 and x12 <= 57) and (x13 >= 48 and x13 <= 57) do
datetime__1(
rest,
[(x13 - 48) * 1 + (x12 - 48) * 10, (x11 - 48) * 1 + (x10 - 48) * 10,
(x9 - 48) * 1 + (x8 - 48) * 10, (x7 - 48) * 1 + (x6 - 48) * 10, (x5 - 48) * 1 + (x4 - 48) * 10,
(x3 - 48) * 1 + (x2 - 48) * 10 + (x1 - 48) * 100 + (x0 - 48) * 1000] ++ acc,
stack,
comb__context,
comb__line,
comb__column + 19
)
end
defp datetime__0(rest, acc, _stack, context, line, column) do
{:error, "...", rest, context, line, column}
end
defp datetime__1(<<"Z", rest::binary>>, acc, stack, comb__context, comb__line, comb__column) do
datetime__2(rest, ["Z"] ++ acc, stack, comb__context, comb__line, comb__column + 1)
end
defp datetime__1(rest, acc, stack, context, line, column) do
datetime__2(rest, acc, stack, context, line, column)
end
defp datetime__2(rest, acc, _stack, context, line, column) do
{:ok, acc, rest, context, line, column}
end
As you can see, it generates highly inlined code, comparable to
hand-written parsers. This gives NimbleParsec
an order of magnitude
performance gains compared to other parser combinators. Further performance
can be gained by giving the inline: true
option to defparsec/3
.
Link to this section Summary
Functions
Defines a single ASCII codepoint in the given ranges.
Defines an ASCII string combinator with an exact length or min
and max
length.
Puts the result of the given combinator as the first element
of a tuple with the byte_offset
as second element.
Chooses one of the given combinators.
Concatenates two combinators.
Inspects the combinator state given to to_debug
with the given opts
.
Defines a combinator with the given name
and opts
.
Defines a combinator with the given name
and opts
.
Defines a parser (and a combinator) with the given name
and opts
.
Defines a private parser (and a combinator) with the given name
and opts
.
Duplicates the combinator to_duplicate
n
times.
Returns an empty combinator.
Defines an end of string combinator.
Marks the given combinator should appear eventually.
Ignores the output of combinator given in to_ignore
.
Defines an integer combinator with of exact length or min
and max
length.
Adds a label to the combinator to be used in error reports.
Puts the result of the given combinator as the first element
of a tuple with the line
as second element.
Checks if a combinator is ahead.
Checks if a combinator is not ahead.
Maps over the combinator results with the remote or local function in call
.
Marks the given combinator as optional
.
Invokes an already compiled combinator with name name
in the
same module.
Traverses the combinator results with the remote or local function call
.
The same as post_traverse/3
but receives the line and offset
from before the wrapped combinators.
Invokes call
to emit the AST that post_traverses the to_post_traverse
combinator results.
The same as quoted_post_traverse/3
but receives the line and offset
from before the wrapped combinators.
Invokes while
to emit the AST that will repeat to_repeat
while the AST code returns {:cont, context}
.
Reduces over the combinator results with the remote or local function in call
.
Allow the combinator given on to_repeat
to appear zero or more times.
Repeats while the given remote or local function while
returns
{:cont, context}
.
Replaces the output of combinator given in to_replace
by a single value.
Defines a string binary value.
Tags the result of the given combinator in to_tag
in a tuple with
tag
as first element.
Allow the combinator given on to_repeat
to appear at least, at most
or exactly a given amout of times.
Unwraps and tags the result of the given combinator in to_tag
in a tuple with
tag
as first element.
Defines a single UTF-8 codepoint in the given ranges.
Defines an ASCII string combinator with of exact length or min
and max
codepoint length.
Wraps the results of the given combinator in to_wrap
in a list.
Link to this section Types
Specs
bin_modifiers() :: :integer | :utf8 | :utf16 | :utf32
Specs
Specs
exclusive_range() :: {:not, Range.t()} | {:not, char()}
Specs
Specs
inclusive_range() :: Range.t() | char()
Specs
Specs
min_and_max() :: {:min, non_neg_integer()} | {:max, pos_integer()}
Specs
range() :: inclusive_range() | exclusive_range()
Specs
t()
Link to this section Functions
Specs
Defines a single ASCII codepoint in the given ranges.
ranges
is a list containing one of:
- a
min..max
range expressing supported codepoints - a
codepoint
integer expressing a supported codepoint {:not, min..max}
expressing not supported codepoints{:not, codepoint}
expressing a not supported codepoint
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_lowercase,
empty()
|> ascii_char([?0..?9])
|> ascii_char([?a..?z])
end
MyParser.digit_and_lowercase("1a")
#=> {:ok, [?1, ?a], "", %{}, {1, 0}, 2}
MyParser.digit_and_lowercase("a1")
#=> {:error, "expected ASCII character in the range '0' to '9', followed by ASCII character in the range 'a' to 'z'", "a1", %{}, {1, 0}, 0}
Specs
ascii_string(t(), [range()], pos_integer() | [min_and_max()]) :: t()
Defines an ASCII string combinator with an exact length or min
and max
length.
The ranges
specify the allowed characters in the ASCII string.
See ascii_char/2
for more information.
If you want a string of unknown size, use ascii_string(ranges, min: 1)
.
If you want a literal string, use string/2
.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :two_lowercase_letters, ascii_string([?a..?z], 2)
end
MyParser.two_lowercase_letters("abc")
#=> {:ok, ["ab"], "c", %{}, {1, 0}, 2}
Specs
Puts the result of the given combinator as the first element
of a tuple with the byte_offset
as second element.
byte_offset
is a non-negative integer.
Specs
Chooses one of the given combinators.
Expects at leasts two choices.
Beware! Char combinators
Note both utf8_char/2
and ascii_char/2
allow multiple ranges to
be given. Therefore, instead this:
choice([
ascii_char([?a..?z]),
ascii_char([?A..?Z]),
])
One should simply prefer:
ascii_char([?a..?z, ?A..?Z])
As the latter is compiled more efficiently by NimbleParsec
.
Beware! Always successful combinators
If a combinator that always succeeds is given as a choice, that choice
will always succeed which may lead to unused function warnings since
any further choice won't ever be attempted. For example, because repeat/2
always succeeds, the string/2
combinator below it won't ever run:
choice([
repeat(ascii_char([?0..?9])),
string("OK")
])
Instead of repeat/2
, you may want to use times/3
with the flags :min
and :max
.
Specs
Concatenates two combinators.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_upper_lower_plus,
concat(
concat(ascii_char([?0..?9]), ascii_char([?A..?Z])),
concat(ascii_char([?a..?z]), ascii_char([?+..?+]))
)
end
MyParser.digit_upper_lower_plus("1Az+")
#=> {:ok, [?1, ?A, ?z, ?+], "", %{}, {1, 0}, 4}
Specs
Inspects the combinator state given to to_debug
with the given opts
.
Defines a combinator with the given name
and opts
.
It is similar to defparsec/3
except it does not define
an entry-point parsing function, just the combinator function
to be used with parsec/2
.
Defines a combinator with the given name
and opts
.
It is similar to defparsecp/3
except it does not define
an entry-point parsing function, just the combinator function
to be used with parsec/2
.
Defines a parser (and a combinator) with the given name
and opts
.
The parser is a function that receives two arguments, the binary to be parsed and a set of options. You can consult the documentation of the generated parser function for more information.
This function will also define a combinator that can be used as
parsec(name)
when building other parsers. See parsec/2
for
more information on invoking compiled combinators.
Beware!
defparsec/3
is executed during compilation. This means you can't
invoke a function defined in the same module. The following will error
because the date
function has not yet been defined:
defmodule MyParser do
import NimbleParsec
def date do
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
end
defparsec :date, date()
end
This can be solved in different ways. You may simply compose a long parser using variables. For example:
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
defparsec :date, date
end
Alternatively, you may define a Helpers
module with many
convenience combinators, and then invoke them in your parser
module:
defmodule MyParser.Helpers do
import NimbleParsec
def date do
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
end
end
defmodule MyParser do
import NimbleParsec
import MyParser.Helpers
defparsec :date, date()
end
The approach of using helper modules is the favorite way
of composing parsers in NimbleParsec
.
Options
:inline
- when true, inlines clauses that work as redirection for other clauses. It is disabled by default because of a bug in Elixir v1.5 and v1.6 where unused functions that are inlined cause a compilation error:debug
- when true, writes generated clauses to:stderr
for debugging
Defines a private parser (and a combinator) with the given name
and opts
.
The same as defparsec/3
but the parsing function is private.
Specs
duplicate(t(), t(), non_neg_integer()) :: t()
Duplicates the combinator to_duplicate
n
times.
Specs
empty() :: t()
Returns an empty combinator.
An empty combinator cannot be compiled on its own.
Specs
Defines an end of string combinator.
The end of string does not produce a token and can be parsed multiple times. This function is useful to avoid having to check for an empty remainder after a successful parse.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letter_pairs, utf8_string([], 2) |> repeat() |> eos()
end
MyParser.letter_pairs("hi")
#=> {:ok, ["hi"], "", %{}, {1, 0}, 2}
MyParser.letter_pairs("hello")
#=> {:error, "expected end of string", "o", %{}, {1, 0}, 4}
Specs
Marks the given combinator should appear eventually.
Any other data before the combinator appears is discarded. If the combinator never appears, then it is an error.
Examples
defmodule MyParser do
import NimbleParsec
hour = integer(min: 1, max: 2)
defparsec :extract_hour, eventually(hour)
end
MyParser.extract_hour("let's meet at 12?")
#=> {:ok, [12], "?", %{}, {1, 0}, 16}
Specs
Ignores the output of combinator given in to_ignore
.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :ignorable, string("T") |> ignore() |> integer(2, 2)
end
MyParser.ignorable("T12")
#=> {:ok, [12], "", %{}, {1, 0}, 2}
Specs
integer(t(), pos_integer() | [min_and_max()]) :: t()
Defines an integer combinator with of exact length or min
and max
length.
If you want an integer of unknown size, use integer(min: 1)
.
This combinator does not parse the sign and is always on base 10.
Examples
With exact length:
defmodule MyParser do
import NimbleParsec
defparsec :two_digits_integer, integer(2)
end
MyParser.two_digits_integer("123")
#=> {:ok, [12], "3", %{}, {1, 0}, 2}
MyParser.two_digits_integer("1a3")
#=> {:error, "expected ASCII character in the range '0' to '9', followed by ASCII character in the range '0' to '9'", "1a3", %{}, {1, 0}, 0}
With min and max:
defmodule MyParser do
import NimbleParsec
defparsec :two_digits_integer, integer(min: 2, max: 4)
end
MyParser.two_digits_integer("123")
#=> {:ok, [123], "", %{}, {1, 0}, 2}
MyParser.two_digits_integer("1a3")
#=> {:error, "expected ASCII character in the range '0' to '9', followed by ASCII character in the range '0' to '9'", "1a3", %{}, {1, 0}, 0}
If the size of the integer has a min and max close to each other, such as from 2 to 4 or from 1 to 2, using choice may emit more efficient code:
choice([integer(4), integer(3), integer(2)])
Note you should start from bigger to smaller.
Specs
Adds a label to the combinator to be used in error reports.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_lowercase,
empty()
|> ascii_char([?0..?9])
|> ascii_char([?a..?z])
|> label("digit followed by lowercase letter")
end
MyParser.digit_and_lowercase("1a")
#=> {:ok, [?1, ?a], "", %{}, {1, 0}, 2}
MyParser.digit_and_lowercase("a1")
#=> {:error, "expected a digit followed by lowercase letter", "a1", %{}, {1, 0}, 0}
Specs
Puts the result of the given combinator as the first element
of a tuple with the line
as second element.
line
is a tuple where the first element is the current line
and the second element is the byte offset immediately after
the newline.
Specs
Checks if a combinator is ahead.
If it succeeds, it continues as usual, otherwise it aborts the
closest choice/2
, repeat/2
, etc. If there is no closest
operation to abort, then it errors.
Note a lookahead nevers changes the accumulated output nor the context.
Examples
For example, imagine you want to parse a language that has the keywords "if" and "while" and identifiers made of any letters or number, where keywords and identifiers can be separated by a single white space:
defmodule IfWhileLang do
import NimbleParsec
keyword =
choice([
string("if") |> replace(:if),
string("while") |> replace(:while)
])
identifier =
ascii_string([?a..?z, ?A..?Z, ?0..?9], min: 1)
defparsec :expr, repeat(choice([keyword, identifier]) |> optional(string(" ")))
end
The issue with the implementation above is that the following will parse:
IfWhileLang.expr("iffy")
{:ok, [:if, "fy"], "", %{}, {1, 0}, 4}
However, "iffy" should be treated as a full identifier. We could
solve this by inverting the order of keyword
and identifier
in :expr
but that means "if" itself will be considered an identifier
and not a keyword. To solve this, we need lookaheads.
One option is to check that after the keyword we either have an empty string OR the end of the string:
keyword =
choice([
string("if") |> replace(:if),
string("while") |> replace(:while)
])
|> lookahead(choice([string(" "), eos()]))
However, in this case, a negative lookahead may be clearer, and we can assert that we don't have any identifier character after the keyword:
keyword =
choice([
string("if") |> replace(:if),
string("while") |> replace(:while)
])
|> lookahead_not(ascii_char([?a..?z, ?A..?Z, ?0..?9]))
Now we get the desired result back:
IfWhileLang.expr("iffy")
#=> {:ok, ["iffy"], "", %{}, {1, 0}, 4}
IfWhileLang.expr("if fy")
#=> {:ok, [:if, " ", "fy"], "", %{}, {1, 0}, 5}
Specs
Checks if a combinator is not ahead.
If it succeeds, it aborts the closest choice/2
, repeat/2
, etc.
Otherwise it continues as usual. If there is no closest operation
to abort, then it errors.
Note a lookahead nevers changes the accumulated output nor the context.
For an example, see lookahead/2
.
Specs
Maps over the combinator results with the remote or local function in call
.
call
is either a {module, function, args}
representing
a remote call, a {function, args}
representing a local call
or an atom function
representing {function, []}
.
Each parser result will be invoked individually for the call
.
Each result be prepended to the given args
. The args
will
be injected at the compile site and therefore must be escapable
via Macro.escape/1
.
See post_traverse/3
for a low level version of this function.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letters_to_string_chars,
ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> map({Integer, :to_string, []})
end
MyParser.letters_to_string_chars("abc")
#=> {:ok, ["97", "98", "99"], "", %{}, {1, 0}, 3}
Specs
Marks the given combinator as optional
.
It is equivalent to choice([optional, empty()])
.
Specs
Invokes an already compiled combinator with name name
in the
same module.
Every parser defined via defparsec/3
or defparsecp/3
can be
used as combinators. However, the defparsec/3
and defparsecp/3
functions also define an entry-point parsing function, as implied
by their names. If you want to define a combinator with the sole
purpose of using it in combinator, use defcombinatorp/3
instead.
Use cases
parsec/2
is useful to implement recursive definitions.
Note while parsec/2
can be used to compose smaller combinators,
the favorite mechanism for doing composition is via regular functions
and not via parsec/2
. Let's see a practical example. Imagine
that you have this module:
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
time =
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
defparsec :datetime, date |> ignore(string("T")) |> concat(time), debug: true
end
Now imagine that you want to break date
and time
apart
into helper functions, as you use them in other occasions.
Generally speaking, you should NOT do this:
defmodule MyParser do
import NimbleParsec
defcombinatorp :date,
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
defcombinatorp :time,
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
defparsec :datetime,
parsec(:date) |> ignore(string("T")) |> concat(parsec(:time))
end
The reason why the above is not recommended is because each
parsec/2
combinator ends-up adding a stacktrace entry during
parsing, which affects the ability of NimbleParsec
to optimize
code. If the goal is to compose combinators, you can do so
with modules and functions:
defmodule MyParser.Helpers do
import NimbleParsec
def date do
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
end
def time do
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
end
end
defmodule MyParser do
import NimbleParsec
import MyParser.Helpers
defparsec :datetime,
date() |> ignore(string("T")) |> concat(time())
end
The implementation above will be able to compile to the most efficient format as possible without forcing new stacktrace entries.
The only situation where you should use parsec/2
for composition
is when a large parser is used over and over again in a way
compilation times are high. In this sense, you can use parsec/2
to improve compilation time at the cost of runtime performance.
By using parsec/2
, the tree size built at compile time will be
reduced although runtime performance is degraded as every time
this function is invoked it introduces a stacktrace entry.
Examples
A very limited but recursive XML parser could be written as follows:
defmodule SimpleXML do
import NimbleParsec
tag = ascii_string([?a..?z, ?A..?Z], min: 1)
text = ascii_string([not: ?<], min: 1)
opening_tag =
ignore(string("<"))
|> concat(tag)
|> ignore(string(">"))
closing_tag =
ignore(string("</"))
|> concat(tag)
|> ignore(string(">"))
defparsec :xml,
opening_tag
|> repeat(lookahead_not(string("</")) |> choice([parsec(:xml), text]))
|> concat(closing_tag)
|> wrap()
end
SimpleXML.xml("<foo>bar</foo>")
#=> {:ok, [["foo", "bar", "foo"]], "", %{}, {1, 0}, 14}
In the example above, defparsec/3
has defined the entry-point
parsing function as well as a combinator which we have invoked
with parsec(:xml)
.
In many cases, however, you want to define recursive combinators
without the entry-point parsing function. We can do this by
replacing defparsec/3
by defcombinatorp
:
defcombinatorp :xml,
opening_tag
|> repeat(lookahead_not(string("</")) |> choice([parsec(:xml), text]))
|> concat(closing_tag)
|> wrap()
Note that now you can no longer invoke SimpleXML.xml(xml)
as
there is no associating parsing function. Eventually you will
have to define a defparsec/3
or defparsecp/3
, that invokes
the combinator above via parsec/3
.
Specs
Traverses the combinator results with the remote or local function call
.
call
is either a {module, function, args}
representing
a remote call, a {function, args}
representing a local call
or an atom function
representing {function, []}
.
The function given in call
will receive 5 additional arguments.
The rest of the parsed binary, the parser results to be post_traversed,
the parser context, the current line and the current offset will
be prepended to the given args
. The args
will be injected at
the compile site and therefore must be escapable via Macro.escape/1
.
The line and offset will represent the location after the combinators.
To retrieve the position before the combinators, use pre_traverse/3
.
The call
must return a tuple {acc, context}
with list of results
to be added to the accumulator as first argument and a context as
second argument. It may also return {:error, reason}
to stop
processing. Notice the received results are in reverse order and
must be returned in reverse order too.
The number of elements returned does not need to be the same as the number of elements given.
This is a low-level function for changing the parsed result.
On top of this function, other functions are built, such as
map/3
if you want to map over each individual element and
not worry about ordering, reduce/3
to reduce all elements
into a single one, replace/3
if you want to replace the
parsed result by a single value and ignore/3
if you want to
ignore the parsed result.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letters_to_chars,
ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> post_traverse({:join_and_wrap, ["-"]})
defp join_and_wrap(_rest, args, context, _line, _offset, joiner) do
{args |> Enum.join(joiner) |> List.wrap(), context}
end
end
MyParser.letters_to_chars("abc")
#=> {:ok, ["99-98-97"], "", %{}, {1, 0}, 3}
Specs
The same as post_traverse/3
but receives the line and offset
from before the wrapped combinators.
post_traverse/3
should be preferred as it keeps less stack
information. Use pre_traverse/3
only if you have to access
the line and offset from before the given combinators.
quoted_post_traverse(combinator \\ empty(), to_post_traverse, call)
View SourceSpecs
Invokes call
to emit the AST that post_traverses the to_post_traverse
combinator results.
call
is a {module, function, args}
and it will receive 5
additional arguments. The AST representation of the rest of the
parsed binary, the parser results, context, line and offset will
be prepended to args
. call
is invoked at compile time and is
useful in combinators that avoid injecting runtime dependencies.
The line and offset will represent the location after the combinators.
To retrieve the position before the combinators, use quoted_pre_traverse/3
.
The call
must return a list of results to be added to
the accumulator. Notice the received results are in reverse
order and must be returned in reverse order too.
The number of elements returned does not need to be the same as the number of elements given.
This function must be used only when you want to emit code that
has no runtime dependencies in other modules. In most cases,
using post_traverse/3
is better, since it doesn't work on ASTs
and instead works at runtime.
Specs
The same as quoted_post_traverse/3
but receives the line and offset
from before the wrapped combinators.
quoted_post_traverse/3
should be preferred as it keeps less stack
information. Use quoted_pre_traverse/3
only if you have to access
the line and offset from before the given combinators.
Specs
Invokes while
to emit the AST that will repeat to_repeat
while the AST code returns {:cont, context}
.
In case repetition should stop, while
must return {:halt, context}
.
while
is a {module, function, args}
and it will receive 4
additional arguments. The AST representations of the binary to be
parsed, context, line and offset will be prepended to args
. while
is invoked at compile time and is useful in combinators that avoid
injecting runtime dependencies.
Specs
Reduces over the combinator results with the remote or local function in call
.
call
is either a {module, function, args}
representing
a remote call, a {function, args}
representing a local call
or an atom function
representing {function, []}
.
The parser results to be reduced will be prepended to the
given args
. The args
will be injected at the compile site
and therefore must be escapable via Macro.escape/1
.
See post_traverse/3
for a low level version of this function.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :letters_to_reduced_chars,
ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> ascii_char([?a..?z])
|> reduce({Enum, :join, ["-"]})
end
MyParser.letters_to_reduced_chars("abc")
#=> {:ok, ["97-98-99"], "", %{}, {1, 0}, 3}
Specs
Allow the combinator given on to_repeat
to appear zero or more times.
Beware! Since repeat/2
allows zero entries, it cannot be used inside
choice/2
, because it will always succeed and may lead to unused function
warnings since any further choice won't ever be attempted. For example,
because repeat/2
always succeeds, the string/2
combinator below it
won't ever run:
choice([
repeat(ascii_char([?a..?z])),
string("OK")
])
Instead of repeat/2
, you may want to use times/3
with the flags :min
and :max
.
Also beware! If you attempt to repeat a combinator that can match nothing,
like optional/2
, repeat/2
will not terminate. For example, consider
this combinator:
repeat(optional(utf8_char([?a])))
This combinator will never terminate because repeat/2
chooses the empty
option of optional/2
every time. Since the goal of the parser above is
to parse 0 or more ?a
characters, it can be represented by
repeat(utf8_char([?a]))
, because repeat/2
allows 0 or more matches.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :repeat_lower, repeat(ascii_char([?a..?z]))
end
MyParser.repeat_lower("abcd")
#=> {:ok, [?a, ?b, ?c, ?d], "", %{}, {1, 0}, 4}
MyParser.repeat_lower("1234")
#=> {:ok, [], "1234", %{}, {1, 0}, 0}
Specs
Repeats while the given remote or local function while
returns
{:cont, context}
.
If the combinator to_repeat
stops matching, then the whole repeat
loop stops successfully, hence it is important to assert the terminated
value after repeating.
In case repetition should stop, while
must return {:halt, context}
.
while
is either a {module, function, args}
representing
a remote call, a {function, args}
representing a local call
or an atom function
representing {function, []}
.
The function given in while
will receive 4 additional arguments.
The rest
of the binary to be parsed, the parser context, the
current line and the current offset will be prepended to the
given args
. The args
will be injected at the compile site
and therefore must be escapable via Macro.escape/1
.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :string_with_quotes,
ascii_char([?"])
|> repeat_while(
choice([
~S(\") |> string() |> replace(?"),
utf8_char([])
]),
{:not_quote, []}
)
|> ascii_char([?"])
|> reduce({List, :to_string, []})
defp not_quote(<<?", _::binary>>, context, _, _), do: {:halt, context}
defp not_quote(_, context, _, _), do: {:cont, context}
end
MyParser.string_with_quotes(~S("string with quotes \" inside"))
{:ok, ["\"string with quotes \" inside\""], "", %{}, {1, 0}, 30}
Note you can use lookahead/2
and lookahead_not/2
with
repeat/2
(instead of repeat_while/3
) to write a combinator
that repeats while a combinator matches (or does not match).
For example, the same combinator above could be written as:
defmodule MyParser do
import NimbleParsec
defparsec :string_with_quotes,
ascii_char([?"])
|> repeat(
lookahead_not(ascii_char([?"]))
|> choice([
~S(\") |> string() |> replace(?"),
utf8_char([])
])
)
|> reduce({List, :to_string, []})
end
MyParser.string_with_quotes(~S("string with quotes \" inside"))
{:ok, ["\"string with quotes \" inside\""], "", %{}, {1, 0}, 30}
However, repeat_while
is still useful when the condition to
repeat comes from the context passed around.
Specs
Replaces the output of combinator given in to_replace
by a single value.
The value
will be injected at the compile site
and therefore must be escapable via Macro.escape/1
.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :replaceable, string("T") |> replace("OTHER") |> integer(2, 2)
end
MyParser.replaceable("T12")
#=> {:ok, ["OTHER", 12], "", %{}, {1, 0}, 2}
Specs
Defines a string binary value.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :string_t, string("T")
end
MyParser.string_t("T")
#=> {:ok, ["T"], "", %{}, {1, 0}, 1}
MyParser.string_t("not T")
#=> {:error, "expected a string \"T\"", "not T", %{}, {1, 0}, 0}
Specs
Tags the result of the given combinator in to_tag
in a tuple with
tag
as first element.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :integer, integer(min: 1) |> tag(:integer)
end
MyParser.integer("1234")
#=> {:ok, [integer: [1234]], "", %{}, {1, 0}, 4}
Notice, however, that the integer result is wrapped in a list, because
the parser is expected to emit multiple tokens. When you are sure that
only a single token is emitted, you should use unwrap_and_tag/3
.
Specs
times(t(), t(), pos_integer() | [min_and_max()]) :: t()
Allow the combinator given on to_repeat
to appear at least, at most
or exactly a given amout of times.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :minimum_lower, times(ascii_char([?a..?z]), min: 2)
end
MyParser.minimum_lower("abcd")
#=> {:ok, [?a, ?b, ?c, ?d], "", %{}, {1, 0}, 4}
MyParser.minimum_lower("ab12")
#=> {:ok, [?a, ?b], "12", %{}, {1, 0}, 2}
MyParser.minimum_lower("a123")
#=> {:ok, [], "a123", %{}, {1, 0}, 0}
Specs
Unwraps and tags the result of the given combinator in to_tag
in a tuple with
tag
as first element.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :integer, integer(min: 1) |> unwrap_and_tag(:integer)
end
MyParser.integer("1234")
#=> {:ok, [integer: 1234], "", %{}, {1, 0}, 4}
In case the combinator emits more than one token, an error will be raised.
See tag/3
for more information.
Specs
Defines a single UTF-8 codepoint in the given ranges.
ranges
is a list containing one of:
- a
min..max
range expressing supported codepoints - a
codepoint
integer expressing a supported codepoint {:not, min..max}
expressing not supported codepoints{:not, codepoint}
expressing a not supported codepoint
Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_utf8,
empty()
|> utf8_char([?0..?9])
|> utf8_char([])
end
MyParser.digit_and_utf8("1é")
#=> {:ok, [?1, ?é], "", %{}, {1, 0}, 2}
MyParser.digit_and_utf8("a1")
#=> {:error, "expected utf8 codepoint in the range '0' to '9', followed by utf8 codepoint", "a1", %{}, {1, 0}, 0}
Specs
utf8_string(t(), [range()], pos_integer() | [min_and_max()]) :: t()
Defines an ASCII string combinator with of exact length or min
and max
codepoint length.
The ranges
specify the allowed characters in the ASCII string.
See ascii_char/2
for more information.
If you want a string of unknown size, use utf8_string(ranges, min: 1)
.
If you want a literal string, use string/2
.
Examples
defmodule MyParser do
import NimbleParsec
defparsec :two_letters, utf8_string([], 2)
end
MyParser.two_letters("áé")
#=> {:ok, ["áé"], "", %{}, {1, 0}, 3}
Specs
Wraps the results of the given combinator in to_wrap
in a list.