expug v0.9.2 Expug.Tokenizer View Source

Tokenizes a Pug template into a list of tokens. The main entry point is tokenize/1.

iex> Expug.Tokenizer.tokenize("title= name")
[
  {{1, 8}, :buffered_text, "name"},
  {{1, 1}, :element_name, "title"},
  {{1, 1}, :indent, 0}
]

Note that the tokens are reversed! It’s easier to append to the top of a list rather than to the end, making it more efficient.

This output is the consumed next by Expug.Compiler, which turns them into an Abstract Syntax Tree.

Token types

div.blue#box

:indent - 0
:element_name - "div"
:element_class - "blue"
:element_id - "box"

div(name="en")

:attribute_open - "("
:attribute_key - "name"
:attribute_value - "\"en\""
:attribute_close - ")"

div= hello

:buffered_text - hello

div!= hello

:unescaped_text - hello

div hello

:raw_text - "hello"

| Hello there

:raw_text - "Hello there"

= Hello there

:buffered_text - "Hello there"

- foo = bar

:statement - foo = bar

doctype html5

:doctype - html5

-# comment
  more comments

:line_comment - comment
:subindent - more comments

// comment
  more comments

:html_comment - comment
:subindent - more comments

Also see

Expug.TokenizerTools has the functions used by this tokenizer.
Expug.Compiler uses the output of this tokenizer to build an AST.
Expug.ExpressionTokenizer is used to tokenize expressions.

Link to this section Summary

Functions

attribute(state)

Matches foo='val' or foo

attribute_brace(state)

attribute_bracket(state)

attribute_equal(state)

attribute_key(state)

attribute_key_value(state)

attribute_list(state)

Matches foo='val' bar='val'

attribute_paren(state)

attribute_separator(state)

Matches an optional comma in between attributes

attribute_value(state)

attributes_block(state)

Matches [name='foo' ...]

block_text(state)

buffered_text(state)

doctype(state)

Matches doctype html

document(state)

Matches an entire document

element(state)

Matches div.foo[id="name"]= Hello world

element_class(state)

Matches .foo

element_class_or_id(state)

Matches .foo or #id (just one)

element_class_or_id_list(state)

Matches .foo.bar#baz

element_descriptor(state)

Matches div, div.foo div.foo.bar#baz, etc

element_descriptor_full(state)

Matches div.foo.bar#baz

element_id(state)

Matches #id

element_name(state)

Matches title in title= hello

element_or_text(state)

Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this

get_indent(list)

get_next_indent(state)

Returns the next indentation level after some newlines. Infers the last indentation level based on doc

get_next_indent(state, level)

Returns the next indentation level after some newlines

html_comment(state)

indent(state)

Matches an indentation. Gives a token that looks like {_, :indent, 2} where the last number is the number of spaces/tabs

line_comment(state)

multiline_buffered_text(state)

multiline_statement(state)

multiline_unescaped_text(state)

newlines(state)

Matches any number of blank newlines. Whitespaces are accounted for

one_line_buffered_text(state)

one_line_statement(state)

one_line_unescaped_text(state)

optional_whitespace(state)

optional_whitespace_or_newline(state)

raw_text(state)

sole_buffered_text(state)

Matches =

sole_raw_text(state)

Matches text

sole_unescaped_text(state)

Matches !=

statement(state)

subindent(state, level)

subindent_block(state)

tokenize(source, opts \\ [])

Tokenizes a string. Returns a list of tokens. Each token is in the format {position, token, value}

unescaped_text(state)

whitespace(state)

Matches whitespace; no tokens emitted

whitespace_or_newline(state)

Matches whitespace or newline; no tokens emitted

Link to this section Functions

attribute(state)

Matches foo='val' or foo

attribute_brace(state)

attribute_bracket(state)

attribute_equal(state)

attribute_key(state)

attribute_key_value(state)

attribute_list(state)

Matches foo='val' bar='val'

attribute_paren(state)

attribute_separator(state)

Matches an optional comma in between attributes.

div(id=a class=b)
div(id=a, class=b)

attribute_value(state)

attributes_block(state)

Matches [name='foo' ...]

block_text(state)

buffered_text(state)

doctype(state)

Matches doctype html.

document(state)

Matches an entire document.

element(state)

Matches div.foo[id="name"]= Hello world

element_class(state)

Matches .foo

element_class_or_id(state)

Matches .foo or #id (just one)

element_class_or_id_list(state)

Matches .foo.bar#baz

element_descriptor(state)

Matches div, div.foo div.foo.bar#baz, etc

element_descriptor_full(state)

Matches div.foo.bar#baz

element_id(state)

Matches #id

element_name(state)

Matches title in title= hello

element_or_text(state)

Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this.

get_indent(list)

get_next_indent(state)

Returns the next indentation level after some newlines. Infers the last indentation level based on doc.

iex> source = "-#\n  span"
iex> doc = [{0, :indent, 0}]
iex> Expug.Tokenizer.get_next_indent(%{tokens: doc, source: source, position: 2}, 0)
2

get_next_indent(state, level)

Returns the next indentation level after some newlines.

iex> source = "-#\n  span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2

iex> source = "-#\n\n\n  span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2

html_comment(state)

indent(state)

Matches an indentation. Gives a token that looks like {_, :indent, 2} where the last number is the number of spaces/tabs.

Doesn’t really care if you use spaces or tabs; a tab is treated like a single space.

line_comment(state)

multiline_buffered_text(state)

multiline_statement(state)

multiline_unescaped_text(state)

newlines(state)

Matches any number of blank newlines. Whitespaces are accounted for.

one_line_buffered_text(state)

one_line_statement(state)

one_line_unescaped_text(state)

optional_whitespace(state)

optional_whitespace_or_newline(state)

raw_text(state)

sole_buffered_text(state)

Matches =

sole_raw_text(state)

Matches text

sole_unescaped_text(state)

Matches !=

statement(state)

subindent(state, level)

subindent_block(state)

tokenize(source, opts \\ [])

Tokenizes a string. Returns a list of tokens. Each token is in the format {position, token, value}.

unescaped_text(state)

whitespace(state)

Matches whitespace; no tokens emitted

whitespace_or_newline(state)

Matches whitespace or newline; no tokens emitted