Module otpcl_parse

OTPCL's parser.

Description

OTPCL's parser.

Unlike most Erlang-derived languages, OTPCL's parser is not based on leex/yecc; rather, it's written by hand (as a side note: the author has no idea exactly what sort of parser OTPCL's parser actually is, though "recursive descent" sounds approximately right, given that it's recursive and it descends; someone who actually went to college is welcome to try to make sense of this module and provide a better explanation of what sort of parser it implements). The parser is (as far as the author can surmise) linear and relatively efficient, albeit only because it "cheats" by punting some things to the interpreter (notably: the parser treats numbers as atoms, so the interpreter is required to reparse atoms if it wants to be able to interpret them as numbers).

Syntax

A "free" character is a character that is neither escaped (i.e. immediately preceded by a backslash character, provided that backslash character is itself "free") nor already part of a lower-level construct.

A program is a list of statements separated by contiguous sequences of free vertical whitespace characters or semicolons.

A statement is a list of words separated by contiguous sequences of free horizontal whitespace characters (escaped vertical whitespace characters are considered to be horizontal whitespace characters). Statements may be treated as "commands" in certain contexts (e.g. commands are specifically the top-level children of a program).

A word is a braced string, double-quoted string, backquoted charlist, single-quoted atom, braced variable, unquoted variable, function call, list, tuple, comment, pipe, or unquoted atom.

A braced string is a free opening curly brace, followed by zero or more characters and/or braced strings, followed by a free closing curly brace. That is: a braced string can be inside a braced string (and curly braces not intended to begin/end an inner braced string should be escaped with an immediately-preceding backslash).

A double-quoted string is a free double-quote, followed by zero or more characters, followed by a free double-quote.

A backquoted charlist is a free backquote, followed by zero or more characters, followed by a free backquote.

A single-quoted atom is a free single-quote, followed by zero or more characters, followed by a free single-quote.

A braced variable is a free dollar-sign, followed by a braced string.

An unquoted variable is a free dollar-sign, followed by a contiguous sequence of characters, terminated by the next free whitespace, semicolon, or (when expected by the parser) closing parenthesis, square bracket, angle bracket, or curly brace. Unquoted variables may not contain free opening parentheses, square brackets, angle brackets, or curly braces; if encountered, the parser will immediately return an error (this may change in the future).

A function call is a free opening square bracket, followed by a statement, followed by a free closing square bracket. It is currently an error for a function call to contain more or less than one statement (this may change in the future).

A list is a free opening parenthesis, followed by a statement (note: the statement is treated purely as a list of words), followed by a free closing parenthesis. It is currently an error for a list to contain more than one statement (this will change in the future).

A tuple is a free opening angle bracket, followed by a statement (note: the statement is treated purely as a list of words), followed by a free closing angle bracket. It is currently an error for a tuple to contain more than one statement (this will change in the future).

A comment is a free octothorpe, followed by a contiguous sequence of characters, terminated by the next vertical whitespace character. A comment terminates the statement in which it is encountered.

A pipe is a free pipe character, followed optionally by a contiguous sequence of characters, terminated by the next free whitespace. The pipe itself is parsed as an unquoted atom, which becomes the first word in a new statement.

An unquoted atom is a contiguous sequence of characters, terminated by the next free whitespace, semicolon, or (when expected by the parser) closing parenthesis, square bracket, angle bracket, or curly brace. Unquoted atoms may not contain free opening parentheses, square brackets, angle brackets, or curly braces; if encountered, the parser will immediately return an error (this may change in the future).

Output

OTPCL's parser does not emit the same exact structures as Erlang's parser (that is: it does not generate Erlang-compatible parse trees). This was probably a mistake (and may very well change, notably because it'd presumably make OTPCL compilation easier by just piggybacking on the existing Erlang-oriented infrastructure), but it works well enough for now.

Tokens

The lexer makes no attempt to actually classify different types of characters (unlike Erlang's lexer); thus, each "token" is simply {Char, Pos={F,L,C}}, where Char is a character code point and Pos is the position of that character (that is, Char came from column C of line L of file F).

Trees

The syntax tree the parser emits is a recursive 3-element tuple of the form {parsed, Type, Branches}, where Type is an atom and Branches is a list of either tokens or trees. By default (i.e. when calling parse/1), the root of the tree will be a program, with command and/or comment branches (pipes are also parsed at this level, but the parser converts those to commands).

Data Types

column_no()

column_no() = integer()

filename()

filename() = any()

level()

level() = atom()

line_no()

line_no() = integer()

parse_error()

parse_error() = {error, reason(), level(), [token()], [tree()]}

parse_success()

parse_success() = {ok, tree(), [token()]}

position()

position() = {filename(), line_no(), column_no()}

reason()

reason() = atom() | {atom(), any()}

str_or_bin()

str_or_bin() = string() | binary()

token()

token() = {char(), position()}

tree()

tree() = {parsed, level(), [tree()] | [token()]}

Function Index

initpos/0Column 0 of row 0 of file nofile.
initpos/1Column 0 of row 0 of file File.
parse/1Like parse/2, but defaulting to program as the toplevel parse tree element.
parse/2Attempts to parse either a string or token list.
scan/1Converts a string into a list of tokens.
scan/2Converts a string into a list of tokens, starting at the specified position.

Function Details

initpos/0

initpos() -> position()

Column 0 of row 0 of file nofile.

initpos/1

initpos(Filename::any()) -> position()

Column 0 of row 0 of file File.

parse/1

parse(Input::str_or_bin()) -> parse_success() | parse_error()

Like parse/2, but defaulting to program as the toplevel parse tree element.

parse/2

parse(Lvls::[level(), ...], Input::str_or_bin()) -> parse_success() | parse_error()

Attempts to parse either a string or token list. Returns either a success response {ok, Tree, Rem} (where Tree is an OTPCL parse tree and Rem is whatever characters were left over

scan/1

scan(Txt::str_or_bin()) -> [token()]

Converts a string into a list of tokens.

scan/2

scan(Txt::str_or_bin(), Pos::position()) -> [token()]

Converts a string into a list of tokens, starting at the specified position.


Generated by EDoc