smalto/grammar
Grammar and rule types for defining syntax highlighting grammars.
A Grammar describes how to tokenize a particular language using an
ordered list of regex-based Rule values. Grammars can extend other
grammars via single inheritance, where child rules take priority over
parent rules.
Rules may contain nested grammars via the Inside type, either as
inline sub-grammars or references to other languages. This follows
the Prism.js grammar model where inside is used for recursive
tokenization of matched regions.
Types
A language grammar definition.
name: the language name (e.g."python","javascript").extends: optional parent language name; child rules override parent rules with the same token name.rules: ordered list of rules, tried first-to-last during tokenization.
pub type Grammar {
Grammar(
name: String,
extends: option.Option(String),
rules: List(Rule),
)
}
Constructors
-
Grammar( name: String, extends: option.Option(String), rules: List(Rule), )
Specifies how a rule’s matched text should be recursively tokenized.
pub type Inside {
InlineGrammar(List(Rule))
LanguageRef(String)
}
Constructors
-
InlineGrammar(List(Rule))An inline grammar: a list of rules applied directly to the matched text. This is the most common form in Prism.js grammars.
-
LanguageRef(String)A reference to another language’s grammar, resolved at tokenization time via a lookup function.
A single pattern rule within a grammar.
token: the token name (e.g."keyword","string"), mapped to aTokenvariant by the engine.pattern: a PCRE regex pattern string.greedy: whenTrue, the pattern matches against the original full source text rather than individual text fragments, preventing false matches inside strings or comments.inside: optional nested grammar for recursive tokenization of the matched text.
pub type Rule {
Rule(
token: String,
pattern: String,
greedy: Bool,
inside: option.Option(Inside),
)
}
Constructors
-
Rule( token: String, pattern: String, greedy: Bool, inside: option.Option(Inside), )
Values
pub fn greedy_rule(token: String, pattern: String) -> Rule
Create a greedy rule with no nested tokenization.
Greedy rules match against the original full source text rather than individual text fragments, preventing false matches inside previously matched regions such as strings or comments.
pub fn greedy_rule_with_inside(
token: String,
pattern: String,
inside: List(Rule),
) -> Rule
Create a greedy rule with an inline grammar for recursive tokenization.
pub fn nested_rule(
token: String,
pattern: String,
language: String,
) -> Rule
Create a rule with a language reference for cross-language tokenization.
The matched text is re-tokenized using the grammar identified by the language name, resolved via the engine’s lookup function.
pub fn resolve(
grammar: Grammar,
lookup: fn(String) -> Grammar,
) -> List(Rule)
Resolve a grammar’s inheritance chain into a flat list of rules.
If the grammar extends a parent, the parent is resolved recursively via
lookup, and the two rule lists are merged: child rules come first, and
any parent rule whose token name matches a child rule is removed.
pub fn rule(token: String, pattern: String) -> Rule
Create a rule with greedy set to False and no nested tokenization.