expug v0.9.2 Expug.TokenizerTools View Source
Builds tokenizers.
defmodule MyTokenizer do
import Expug.TokenizerTools
def tokenizer(source)
run(source, [], &document/1)
end
def document(state)
state
|> discard(%r/^doctype /, :doctype_prelude)
|> eat(%r/^[a-z0-9]+/, :doctype_value)
end
end
The state
Expug.TokenizerTools.State
is a struct from the source
and opts
given to run/3
.
%{ tokens: [], source: "...", position: 0, options: ... }
run/3
creates the state and invokes a function you give it.
source = "doctype html"
run(source, [], &document/1)
eat/3
tries to find the given regexp from the source
at position pos
.
If it matches, it returns a new state: a new token is added (:open_quote
in
this case), and the position pos
is advanced.
eat(state, ~r/^"/, :open_quote)
If it fails to match, it’ll throw a {:parse_error, pos, [:open_quote]}
.
Roughly this translates to “parse error in position pos, expected to find
:open_quote”.
Mixing and matching
eat/3
will normally be wrapped into functions for most token types.
def doctype(state)
state
|> discard(%r/^doctype/, :doctype_prelude)
|> whitespace()
|> eat(%r/^[a-z0-9]+/, :doctype_value)
end
def whitespace(state)
state
|> eat(^r/[ ]+, :whitespace, :nil)
end
one_of/3
, optional/2
, many_of/2
can then be used to compose these functions.
state
|> one_of([ &doctype/1, &foobar/1 ])
|> optional(&doctype/1)
|> many_of(&doctype/1)
Link to this section Summary
Functions
Like eat/4
, but instead of creating a token, it appends to the last token
Converts numeric positions into {line, col}
tuples
Consumes a token, but doesn’t push it to the State
Consumes a token
Consumes a token
Consumes a token
Turns a State into a final result
Extracts the last parse errors that happened
Checks many of a certain token
Checks many of a certain token, and lets you provide a different tail
Finds any one of the given token-eater functions
An optional argument
Checks many of a certain token
Runs; catches parse errors and throws them properly
Gets rid of the :parse_error
hints in the document
Creates an token with a given token_name
Link to this section Functions
Like eat/4
, but instead of creating a token, it appends to the last token.
Useful alongside start_empty()
.
state
|> start_empty(:quoted_string)
|> append(~r/^"/)
|> append(~r/[^"]+/)
|> append(~r/^"/)
Converts numeric positions into {line, col}
tuples.
iex> source = "div\n body"
iex> doc = [
...> { 0, :indent, "" },
...> { 0, :element_name, "div" },
...> { 4, :indent, " " },
...> { 6, :element_name, "body" }
...> ]
iex> Expug.TokenizerTools.convert_positions(doc, source)
[
{ {1, 1}, :indent, "" },
{ {1, 1}, :element_name, "div" },
{ {2, 1}, :indent, " " },
{ {2, 3}, :element_name, "body" }
]
Consumes a token, but doesn’t push it to the State.
state
|> eat(~r/[a-z]+/, :key)
|> discard(~r/ *= */, :equal)
|> eat(~r/[a-z]+/, :value)
Consumes a token.
See eat/4
.
Consumes a token.
state
|> eat(~r/[a-z]+/, :key)
|> discard(~r/ *= */, :equal)
|> eat(~r/[a-z]+/, :value)
Consumes a token.
eat state, ~r/.../, :document
Returns a State
. Available parameters are:
state
- assumed to be a state map (given byrun/3
).expr
- regexp expression.token_name
(atom, optional) - token name.reducer
(function, optional) - a function.
Reducers
If reducer
is a function, tokens
is transformed using that function.
eat state, ~r/.../, :document, &[{&3, :document, &2} | &1]
# &1 == tokens in current State
# &2 == matched String
# &3 == position
Also see
discard/3
will consume a token, but not push it to the State.
state
|> discard(~r/ +/, :whitespace) # discard it
Turns a State into a final result.
Returns either {:ok, doc}
or {:parse_error, %{type, position, expected}}
.
Guards against unexpected end-of-file.
Extracts the last parse errors that happened.
In case of failure, run/3
will check the last parse errors
that happened. Returns a list of atoms of the expected tokens.
Checks many of a certain token.
Checks many of a certain token, and lets you provide a different tail
.
Finds any one of the given token-eater functions.
state |> one_of([ &brackets/1, &braces/1, &parens/1 ])
An optional argument.
state |> optional(&text/1)
Checks many of a certain token.
Syntactic sugar for optional(s, many_of(s, ...))
.
Runs; catches parse errors and throws them properly.
Gets rid of the :parse_error
hints in the document.
Creates an token with a given token_name
.
This is functionally the same as |> eat(~r//, :token_name)
, but using
start_empty()
can make your code more readable.
state
|> start_empty(:quoted_string)
|> append(~r/^"/)
|> append(~r/[^"]+/)
|> append(~r/^"/)