Nasty.Language.Catalan.Tokenizer (Nasty v0.3.0)
View SourceTokenizer for Catalan text using NimbleParsec.
Catalan-Specific Features
- Interpunct (l·l): Kept as single token
- Apostrophe contractions: l', d', s', n', m', t'
- Article contractions: del, al, pel
- Catalan diacritics: à, è, é, í, ï, ò, ó, ú, ü, ç
Summary
Functions
Parses the given binary as parse_text.
Functions
@spec parse_text(binary(), keyword()) :: {:ok, [term()], rest, context, line, byte_offset} | {:error, reason, rest, context, line, byte_offset} when line: {pos_integer(), byte_offset}, byte_offset: non_neg_integer(), rest: binary(), reason: String.t(), context: map()
Parses the given binary as parse_text.
Returns {:ok, [token], rest, context, position, byte_offset} or
{:error, reason, rest, context, line, byte_offset} where position
describes the location of the parse_text (start position) as {line, offset_to_start_of_line}.
To column where the error occurred can be inferred from byte_offset - offset_to_start_of_line.
Options
:byte_offset- the byte offset for the whole binary, defaults to 0:line- the line and the byte offset into that line, defaults to{1, byte_offset}:context- the initial context value. It will be converted to a map
@spec tokenize( String.t(), keyword() ) :: {:ok, [Nasty.AST.Token.t()]} | {:error, term()}