expug v0.9.2 Expug.Tokenizer View Source

Tokenizes a Pug template into a list of tokens. The main entry point is tokenize/1.

iex> Expug.Tokenizer.tokenize("title= name")
[
  {{1, 8}, :buffered_text, "name"},
  {{1, 1}, :element_name, "title"},
  {{1, 1}, :indent, 0}
]

Note that the tokens are reversed! It’s easier to append to the top of a list rather than to the end, making it more efficient.

This output is the consumed next by Expug.Compiler, which turns them into an Abstract Syntax Tree.

Token types

div.blue#box
  • :indent - 0
  • :element_name - "div"
  • :element_class - "blue"
  • :element_id - "box"
div(name="en")
  • :attribute_open - "("
  • :attribute_key - "name"
  • :attribute_value - "\"en\""
  • :attribute_close - ")"
div= hello
  • :buffered_text - hello
div!= hello
  • :unescaped_text - hello
div hello
  • :raw_text - "hello"
| Hello there
  • :raw_text - "Hello there"
= Hello there
  • :buffered_text - "Hello there"
- foo = bar
  • :statement - foo = bar
doctype html5
  • :doctype - html5
-# comment
  more comments
  • :line_comment - comment
  • :subindent - more comments
// comment
  more comments
  • :html_comment - comment
  • :subindent - more comments

Also see

Link to this section Summary

Functions

Matches foo='val' or foo

Matches foo='val' bar='val'

Matches an optional comma in between attributes

Matches [name='foo' ...]

Matches doctype html

Matches an entire document

Matches div.foo[id="name"]= Hello world

Matches .foo

Matches .foo or #id (just one)

Matches .foo.bar#baz

Matches div, div.foo div.foo.bar#baz, etc

Matches div.foo.bar#baz

Matches #id

Matches title in title= hello

Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this

Returns the next indentation level after some newlines. Infers the last indentation level based on doc

Returns the next indentation level after some newlines

Matches an indentation. Gives a token that looks like {_, :indent, 2} where the last number is the number of spaces/tabs

Matches any number of blank newlines. Whitespaces are accounted for

Matches text

Tokenizes a string. Returns a list of tokens. Each token is in the format {position, token, value}

Matches whitespace; no tokens emitted

Matches whitespace or newline; no tokens emitted

Link to this section Functions

Matches foo='val' or foo

Link to this function attribute_bracket(state) View Source
Link to this function attribute_key_value(state) View Source

Matches foo='val' bar='val'

Link to this function attribute_separator(state) View Source

Matches an optional comma in between attributes.

div(id=a class=b)
div(id=a, class=b)

Matches [name='foo' ...]

Matches doctype html.

Matches an entire document.

Matches div.foo[id="name"]= Hello world

Matches .foo

Link to this function element_class_or_id(state) View Source

Matches .foo or #id (just one)

Link to this function element_class_or_id_list(state) View Source

Matches .foo.bar#baz

Link to this function element_descriptor(state) View Source

Matches div, div.foo div.foo.bar#baz, etc

Link to this function element_descriptor_full(state) View Source

Matches div.foo.bar#baz

Matches #id

Matches title in title= hello

Matches an HTML element, text node, or, you know… the basic statements. I don’t know what to call this.

Returns the next indentation level after some newlines. Infers the last indentation level based on doc.

iex> source = "-#\n  span"
iex> doc = [{0, :indent, 0}]
iex> Expug.Tokenizer.get_next_indent(%{tokens: doc, source: source, position: 2}, 0)
2
Link to this function get_next_indent(state, level) View Source

Returns the next indentation level after some newlines.

iex> source = "-#\n  span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2

iex> source = "-#\n\n\n  span"
iex> Expug.Tokenizer.get_next_indent(%{tokens: [], source: source, position: 2}, 0)
2

Matches an indentation. Gives a token that looks like {_, :indent, 2} where the last number is the number of spaces/tabs.

Doesn’t really care if you use spaces or tabs; a tab is treated like a single space.

Link to this function multiline_buffered_text(state) View Source
Link to this function multiline_statement(state) View Source
Link to this function multiline_unescaped_text(state) View Source

Matches any number of blank newlines. Whitespaces are accounted for.

Link to this function one_line_buffered_text(state) View Source
Link to this function one_line_statement(state) View Source
Link to this function one_line_unescaped_text(state) View Source
Link to this function optional_whitespace(state) View Source
Link to this function optional_whitespace_or_newline(state) View Source
Link to this function sole_buffered_text(state) View Source

Matches =

Matches text

Link to this function sole_unescaped_text(state) View Source

Matches !=

Link to this function tokenize(source, opts \\ []) View Source

Tokenizes a string. Returns a list of tokens. Each token is in the format {position, token, value}.

Matches whitespace; no tokens emitted

Link to this function whitespace_or_newline(state) View Source

Matches whitespace or newline; no tokens emitted