OTPCL's parser.
Unlike most Erlang-derived languages, OTPCL's parser is not based on leex/yecc; rather, it's written by hand (as a side note: the author has no idea exactly what sort of parser OTPCL's parser actually is, though "recursive descent" sounds approximately right, given that it's recursive and it descends; someone who actually went to college is welcome to try to make sense of this module and provide a better explanation of what sort of parser it implements). The parser is (as far as the author can surmise) linear and relatively efficient, albeit only because it "cheats" by punting some things to the interpreter (notably: the parser treats numbers as atoms, so the interpreter is required to reparse atoms if it wants to be able to interpret them as numbers).
A "free" character is a character that is neither escaped (i.e. immediately preceded by a backslash character, provided that backslash character is itself "free") nor already part of a lower-level construct.
A program is a list of statements separated by contiguous sequences of free vertical whitespace characters or semicolons.
A statement is a list of words separated by contiguous sequences of free horizontal whitespace characters (escaped vertical whitespace characters are considered to be horizontal whitespace characters). Statements may be treated as "commands" in certain contexts (e.g. commands are specifically the top-level children of a program).
A word is a braced string, double-quoted string, backquoted charlist, single-quoted atom, braced variable, unquoted variable, function call, list, tuple, comment, pipe, or unquoted atom.
A braced string is a free opening curly brace, followed by zero or more characters and/or braced strings, followed by a free closing curly brace. That is: a braced string can be inside a braced string (and curly braces not intended to begin/end an inner braced string should be escaped with an immediately-preceding backslash).
A double-quoted string is a free double-quote, followed by zero or more characters, followed by a free double-quote.
A backquoted charlist is a free backquote, followed by zero or more characters, followed by a free backquote.
A single-quoted atom is a free single-quote, followed by zero or more characters, followed by a free single-quote.
A braced variable is a free dollar-sign, followed by a braced string.
An unquoted variable is a free dollar-sign, followed by a contiguous sequence of characters, terminated by the next free whitespace, semicolon, or (when expected by the parser) closing parenthesis, square bracket, angle bracket, or curly brace. Unquoted variables may not contain free opening parentheses, square brackets, angle brackets, or curly braces; if encountered, the parser will immediately return an error (this may change in the future).
A function call is a free opening square bracket, followed by a statement, followed by a free closing square bracket. It is currently an error for a function call to contain more or less than one statement (this may change in the future).
A list is a free opening parenthesis, followed by a statement (note: the statement is treated purely as a list of words), followed by a free closing parenthesis. It is currently an error for a list to contain more than one statement (this will change in the future).
A tuple is a free opening angle bracket, followed by a statement (note: the statement is treated purely as a list of words), followed by a free closing angle bracket. It is currently an error for a tuple to contain more than one statement (this will change in the future).
A comment is a free octothorpe, followed by a contiguous sequence of characters, terminated by the next vertical whitespace character. A comment terminates the statement in which it is encountered.
A pipe is a free pipe character, followed optionally by a contiguous sequence of characters, terminated by the next free whitespace. The pipe itself is parsed as an unquoted atom, which becomes the first word in a new statement.
An unquoted atom is a contiguous sequence of characters, terminated by the next free whitespace, semicolon, or (when expected by the parser) closing parenthesis, square bracket, angle bracket, or curly brace. Unquoted atoms may not contain free opening parentheses, square brackets, angle brackets, or curly braces; if encountered, the parser will immediately return an error (this may change in the future).
OTPCL's parser does not emit the same exact structures as Erlang's parser (that is: it does not generate Erlang-compatible parse trees). This was probably a mistake (and may very well change, notably because it'd presumably make OTPCL compilation easier by just piggybacking on the existing Erlang-oriented infrastructure), but it works well enough for now.
The lexer makes no attempt to actually classify different types of characters
(unlike Erlang's lexer); thus, each "token" is simply {Char, Pos={F,L,C}}
,
where Char
is a character code point and Pos
is the position of that
character (that is, Char
came from column C
of line L
of file
F
).
{parsed, Type, Branches}
, where Type
is an atom and Branches
is a list
of either tokens or trees. By default (i.e. when calling parse/1), the root
of the tree will be a program
, with command
and/or comment
branches
(pipe
s are also parsed at this level, but the parser converts those to
command
s).
column_no() = integer()
filename() = any()
level() = atom()
line_no() = integer()
parse_error() = {error, reason(), level(), [token()], [tree()]}
parse_success() = {ok, tree(), [token()]}
position() = {filename(), line_no(), column_no()}
reason() = atom() | {atom(), any()}
str_or_bin() = string() | binary()
token() = {char(), position()}
tree() = {parsed, level(), [tree()] | [token()]}
initpos/0 | Column 0 of row 0 of file nofile . |
initpos/1 | Column 0 of row 0 of file File . |
parse/1 | Like parse/2, but defaulting to program as the toplevel parse tree
element. |
parse/2 | Attempts to parse either a string or token list. |
scan/1 | Converts a string into a list of tokens. |
scan/2 | Converts a string into a list of tokens, starting at the specified position. |
initpos() -> position()
Column 0 of row 0 of file nofile
.
initpos(Filename::any()) -> position()
Column 0 of row 0 of file File
.
parse(Input::str_or_bin()) -> parse_success() | parse_error()
Like parse/2, but defaulting to program
as the toplevel parse tree
element.
parse(Lvls::[level(), ...], Input::str_or_bin()) -> parse_success() | parse_error()
Attempts to parse either a string or token list. Returns either a
success response {ok, Tree, Rem}
(where Tree
is an OTPCL parse tree and
Rem
is whatever characters were left over
scan(Txt::str_or_bin()) -> [token()]
Converts a string into a list of tokens.
scan(Txt::str_or_bin(), Pos::position()) -> [token()]
Converts a string into a list of tokens, starting at the specified position.
Generated by EDoc