Nasty.Data.CoNLLU (Nasty v0.3.0)
View SourceParser for CoNLL-U format used by Universal Dependencies.
CoNLL-U Format
CoNLL-U is a tab-separated format with 10 columns:
- ID - Word index
- FORM - Word form
- LEMMA - Lemma
- UPOS - Universal POS tag
- XPOS - Language-specific POS tag
- FEATS - Morphological features
- HEAD - Head of dependency relation
- DEPREL - Dependency relation
- DEPS - Enhanced dependencies
- MISC - Miscellaneous annotations
Lines starting with # are comments (sentence-level metadata). Blank lines separate sentences.
Examples
# Parse a file
{:ok, sentences} = CoNLLU.parse_file("en_ewt-ud-train.conllu")
# Parse a string
conllu_text = """
# sent_id = 1
# text = The cat sat.
1\tThe\tthe\tDET\t...
2\tcat\tcat\tNOUN\t...
3\tsat\tsit\tVERB\t...
"""
{:ok, sentences} = CoNLLU.parse_string(conllu_text)
Summary
Functions
Convert parsed sentences back to CoNLL-U format.
Parse a CoNLL-U file.
Parse a CoNLL-U formatted string.
Types
@type token() :: %{ id: pos_integer(), form: String.t(), lemma: String.t(), upos: atom(), xpos: String.t() | nil, feats: map(), head: non_neg_integer(), deprel: String.t(), deps: String.t() | nil, misc: map() }
Functions
Convert parsed sentences back to CoNLL-U format.
Parameters
sentences- List of sentence maps
Returns
- CoNLL-U formatted string
Parse a CoNLL-U file.
Parameters
path- Path to the .conllu file
Returns
{:ok, sentences}- List of parsed sentences{:error, reason}- Parse error
Parse a CoNLL-U formatted string.
Parameters
content- CoNLL-U formatted text
Returns
{:ok, sentences}- List of parsed sentences{:error, reason}- Parse error