expug v0.9.2 Expug.Tokenizer View Source
Tokenizes a Pug template into a list of tokens. The main entry point is
tokenize/1
.
iex> Expug.Tokenizer.tokenize("title= name")
[
{{1, 8}, :buffered_text, "name"},
{{1, 1}, :element_name, "title"},
{{1, 1}, :indent, 0}
]
Note that the tokens are reversed! It’s easier to append to the top of a list rather than to the end, making it more efficient.
This output is the consumed next by Expug.Compiler
, which turns them into
an Abstract Syntax Tree.
Token types
div.blue#box
:indent
- 0:element_name
-"div"
:element_class
-"blue"
:element_id
-"box"
div(name="en")
:attribute_open
-"("
:attribute_key
-"name"
:attribute_value
-"\"en\""
:attribute_close
-")"
div= hello
:buffered_text
-hello
div!= hello
:unescaped_text
-hello
div hello
:raw_text
-"hello"
| Hello there
:raw_text
-"Hello there"
= Hello there
:buffered_text
-"Hello there"
- foo = bar
:statement
-foo = bar
doctype html5
:doctype
-html5
-# comment
more comments
:line_comment
-comment
:subindent
-more comments
// comment
more comments
:html_comment
-comment
:subindent
-more comments
Also see
Expug.TokenizerTools
has the functions used by this tokenizer.Expug.Compiler
uses the output of this tokenizer to build an AST.Expug.ExpressionTokenizer
is used to tokenize expressions.
Link to this section Summary
Functions
Matches foo='val'
or foo
Matches foo='val' bar='val'
Matches an optional comma in between attributes
Matches [name='foo' ...]
Matches doctype html
Matches an entire document
Matches div.foo[id="name"]= Hello world
Matches .foo
Matches .foo
or #id
(just one)
Matches .foo.bar#baz
Matches div
, div.foo
div.foo.bar#baz
, etc
Matches div.foo.bar#baz
Matches #id
Matches title
in title= hello
Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this
Returns the next indentation level after some newlines.
Infers the last indentation level based on doc
Returns the next indentation level after some newlines
Matches an indentation. Gives a token that looks like {_, :indent, 2}
where the last number is the number of spaces/tabs
Matches any number of blank newlines. Whitespaces are accounted for
Matches =
Matches text
Matches !=
Tokenizes a string.
Returns a list of tokens. Each token is in the format {position, token, value}
Matches whitespace; no tokens emitted
Matches whitespace or newline; no tokens emitted
Link to this section Functions
Matches foo='val'
or foo
Matches foo='val' bar='val'
Matches an optional comma in between attributes.
div(id=a class=b)
div(id=a, class=b)
Matches [name='foo' ...]
Matches doctype html
.
Matches an entire document.
Matches div.foo[id="name"]= Hello world
Matches .foo
Matches .foo
or #id
(just one)
Matches .foo.bar#baz
Matches div
, div.foo
div.foo.bar#baz
, etc
Matches div.foo.bar#baz
Matches #id
Matches title
in title= hello
Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this.
Returns the next indentation level after some newlines.
Infers the last indentation level based on doc
.
iex> source = "-#\n span"
iex> doc = [{0, :indent, 0}]
iex> Expug.Tokenizer.get_next_indent(%{tokens: doc, source: source, position: 2}, 0)
2
Returns the next indentation level after some newlines.
iex> source = "-#\n span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2
iex> source = "-#\n\n\n span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2
Matches an indentation. Gives a token that looks like {_, :indent, 2}
where the last number is the number of spaces/tabs.
Doesn’t really care if you use spaces or tabs; a tab is treated like a single space.
Matches any number of blank newlines. Whitespaces are accounted for.
Matches =
Matches text
Matches !=
Tokenizes a string.
Returns a list of tokens. Each token is in the format {position, token, value}
.
Matches whitespace; no tokens emitted
Matches whitespace or newline; no tokens emitted