expug v0.9.2 Expug.TokenizerTools View Source

Builds tokenizers.

defmodule MyTokenizer do
  import Expug.TokenizerTools

  def tokenizer(source)
    run(source, [], &document/1)
  end

  def document(state)
    state
    |> discard(%r/^doctype /, :doctype_prelude)
    |> eat(%r/^[a-z0-9]+/, :doctype_value)
  end
end

The state

Expug.TokenizerTools.State is a struct from the source and opts given to run/3.

%{ tokens: [], source: "...", position: 0, options: ... }

run/3 creates the state and invokes a function you give it.

source = "doctype html"
run(source, [], &document/1)

eat/3 tries to find the given regexp from the source at position pos. If it matches, it returns a new state: a new token is added (:open_quote in this case), and the position pos is advanced.

eat(state, ~r/^"/, :open_quote)

If it fails to match, it’ll throw a {:parse_error, pos, [:open_quote]}. Roughly this translates to “parse error in position pos, expected to find :open_quote”.

Mixing and matching

eat/3 will normally be wrapped into functions for most token types.

def doctype(state)
  state
  |> discard(%r/^doctype/, :doctype_prelude)
  |> whitespace()
  |> eat(%r/^[a-z0-9]+/, :doctype_value)
end

def whitespace(state)
  state
  |> eat(^r/[   ]+, :whitespace, :nil)
end

one_of/3, optional/2, many_of/2 can then be used to compose these functions.

state
|> one_of([ &doctype/1, &foobar/1 ])
|> optional(&doctype/1)
|> many_of(&doctype/1)

Link to this section Summary

Functions

Like eat/4, but instead of creating a token, it appends to the last token

Converts numeric positions into {line, col} tuples

Consumes a token, but doesn’t push it to the State

Consumes a token

Consumes a token

Turns a State into a final result

Extracts the last parse errors that happened

Checks many of a certain token

Checks many of a certain token, and lets you provide a different tail

Finds any one of the given token-eater functions

An optional argument

Checks many of a certain token

Runs; catches parse errors and throws them properly

Gets rid of the :parse_error hints in the document

Creates an token with a given token_name

Link to this section Functions

Like eat/4, but instead of creating a token, it appends to the last token.

Useful alongside start_empty().

state
|> start_empty(:quoted_string)
|> append(~r/^"/)
|> append(~r/[^"]+/)
|> append(~r/^"/)
Link to this function convert_positions(doc, source) View Source

Converts numeric positions into {line, col} tuples.

iex> source = "div\n  body"
iex> doc = [
...>   { 0, :indent, "" },
...>   { 0, :element_name, "div" },
...>   { 4, :indent, "  " },
...>   { 6, :element_name, "body" }
...> ]
iex> Expug.TokenizerTools.convert_positions(doc, source)
[
  { {1, 1}, :indent, "" },
  { {1, 1}, :element_name, "div" },
  { {2, 1}, :indent, "  " },
  { {2, 3}, :element_name, "body" }
]
Link to this function discard(state, expr, token_name) View Source

Consumes a token, but doesn’t push it to the State.

state
|> eat(~r/[a-z]+/, :key)
|> discard(~r/ *= */, :equal)
|> eat(~r/[a-z]+/, :value)

Consumes a token.

See eat/4.

Link to this function eat(state, expr, token_name) View Source

Consumes a token.

state
|> eat(~r/[a-z]+/, :key)
|> discard(~r/ *= */, :equal)
|> eat(~r/[a-z]+/, :value)
Link to this function eat(state, expr, token_name, fun) View Source

Consumes a token.

eat state, ~r/.../, :document

Returns a State. Available parameters are:

  • state - assumed to be a state map (given by run/3).
  • expr - regexp expression.
  • token_name (atom, optional) - token name.
  • reducer (function, optional) - a function.

Reducers

If reducer is a function, tokens is transformed using that function.

eat state, ~r/.../, :document, &[{&3, :document, &2} | &1]

# &1 == tokens in current State
# &2 == matched String
# &3 == position

Also see

discard/3 will consume a token, but not push it to the State.

state
|> discard(~r/ +/, :whitespace)  # discard it

Turns a State into a final result.

Returns either {:ok, doc} or {:parse_error, %{type, position, expected}}. Guards against unexpected end-of-file.

Extracts the last parse errors that happened.

In case of failure, run/3 will check the last parse errors that happened. Returns a list of atoms of the expected tokens.

Checks many of a certain token.

Link to this function many_of(state, head, tail) View Source

Checks many of a certain token, and lets you provide a different tail.

Link to this function one_of(state, funs, expected \\ []) View Source

Finds any one of the given token-eater functions.

state |> one_of([ &brackets/1, &braces/1, &parens/1 ])

An optional argument.

state |> optional(&text/1)
Link to this function optional_many_of(state, head) View Source

Checks many of a certain token.

Syntactic sugar for optional(s, many_of(s, ...)).

Runs; catches parse errors and throws them properly.

Gets rid of the :parse_error hints in the document.

Link to this function start_empty(state, token_name) View Source

Creates an token with a given token_name.

This is functionally the same as |> eat(~r//, :token_name), but using start_empty() can make your code more readable.

state
|> start_empty(:quoted_string)
|> append(~r/^"/)
|> append(~r/[^"]+/)
|> append(~r/^"/)