Nasty Logo Nasty → Natural Abstract Syntax Tree Yeoman

View Source

CI codecov Hex.pm Docs

A comprehensive NLP library for Elixir that treats natural language with the same rigor as programming languages.

Nasty provides a complete grammatical Abstract Syntax Tree (AST) for multiple natural languages (English, Spanish, and Catalan), with a full NLP pipeline from tokenization to text summarization.

  • Tokenization - NimbleParsec-based text segmentation
  • POS Tagging - Rule-based + Statistical (HMM with Viterbi) + Neural (BiLSTM-CRF)
  • Morphological Analysis - Lemmatization and features
  • Phrase Structure Parsing - NP, VP, PP, and relative clauses
  • Complex Sentences - Coordination, subordination
  • Dependency Extraction - Universal Dependencies relations
  • Named Entity Recognition - Person, place, organization
  • Semantic Role Labeling - Predicate-argument structure (who did what to whom)
  • Coreference Resolution - Link mentions across sentences
  • Text Summarization - Extractive summarization with MMR
  • Question Answering - Extractive QA for factoid questions
  • Text Classification - Multinomial Naive Bayes classifier with multiple feature types
  • Information Extraction - Relation extraction, event extraction, and template-based extraction
  • Statistical Models - HMM POS tagger with 95% accuracy
  • Neural Models - BiLSTM-CRF with 97-98% accuracy using Axon/EXLA
  • Code Interoperability - Bidirectional NL ↔ Code conversion (Natural language commands to Elixir code and vice versa)
  • AST Rendering - Convert AST back to natural language text
  • Translation - AST-based translation with morphological agreement and word order transformations
  • AST Utilities - Traversal, queries, validation, and transformations
  • Visualization - Export to DOT/Graphviz and JSON formats
  • Multi-Language Support - English, Spanish, and Catalan with language-agnostic architecture

Quick Start

# Run the complete demo
mix run demo.exs

# Or try specific examples
mix run examples/catalan_example.exs
mix run examples/roundtrip_translation.exs
mix run examples/multilingual_pipeline.exs

New to Nasty? Start with the Getting Started Guide for a beginner-friendly tutorial.

alias Nasty.Language.English

# Simple example
text = "John Smith works at Google in New York."

{:ok, tokens} = English.tokenize(text)
{:ok, tagged} = English.tag_pos(tokens)
{:ok, document} = English.parse(tagged)

# Extract entities
alias Nasty.Language.English.EntityRecognizer
entities = EntityRecognizer.recognize(tagged)
# => [%Entity{type: :person, text: "John Smith"}, 
#     %Entity{type: :org, text: "Google"}, ...]

# Extract dependencies
alias Nasty.Language.English.DependencyExtractor
sentences = document.paragraphs |> Enum.flat_map(& &1.sentences)
deps = Enum.flat_map(sentences, &DependencyExtractor.extract/1)

# Semantic role labeling
{:ok, document_with_srl} = Nasty.Language.English.parse(tagged, semantic_roles: true)
# Access semantic frames
frames = document_with_srl.semantic_frames
# => [%SemanticFrame{predicate: "works", roles: [%Role{type: :agent, text: "John Smith"}, ...]}]

# Coreference resolution
{:ok, document_with_coref} = Nasty.Language.English.parse(tagged, coreference: true)
# Access coreference chains
chains = document_with_coref.coref_chains
# => [%CorefChain{representative: "John Smith", mentions: ["John Smith", "he"], ...}]

# Summarize
summary = English.summarize(document, ratio: 0.3)  # 30% compression
# or
summary = English.summarize(document, max_sentences: 3)  # Fixed count

# MMR (Maximal Marginal Relevance) for reduced redundancy
summary_mmr = English.summarize(document, max_sentences: 3, method: :mmr, mmr_lambda: 0.5)

# Question answering
{:ok, answers} = English.answer_question(document, "Who works at Google?")
# => [%Answer{text: "John Smith", confidence: 0.85, ...}]

# Statistical POS tagging (auto-loads from priv/models/)
{:ok, tokens_hmm} = English.tag_pos(tokens, model: :hmm)

# Neural POS tagging (97-98% accuracy)
{:ok, tokens_neural} = English.tag_pos(tokens, model: :neural)

# Or ensemble mode (combines neural + statistical + rule-based)
{:ok, tokens_ensemble} = English.tag_pos(tokens, model: :ensemble)

# Text classification
# Train a sentiment classifier
training_data = [
  {positive_doc1, :positive},
  {positive_doc2, :positive},
  {negative_doc1, :negative},
  {negative_doc2, :negative}
]
model = English.train_classifier(training_data, features: [:bow, :lexical])

# Classify new documents
{:ok, predictions} = English.classify(test_doc, model)
# => [%Classification{class: :positive, confidence: 0.85, ...}, ...]

# Information extraction
# Extract relations between entities
{:ok, relations} = English.extract_relations(document)
# => [%Relation{type: :works_at, subject: person, object: org, confidence: 0.8}]

# Extract events with participants
{:ok, events} = English.extract_events(document)
# => [%Event{type: :business_acquisition, trigger: "acquired", participants: %{agent: ..., patient: ...}}]

# Template-based extraction
templates = [TemplateExtractor.employment_template()]
{:ok, results} = English.extract_templates(document, templates)
# => [%{template: "employment", slots: %{employee: "John", employer: "Google"}, confidence: 0.85}]

Architecture

graph LR
    A[Text] --> B[Tokenization]
    B --> C[POS Tagging]
    C --> D[Phrase Parsing]
    D --> E[Sentence Parsing]
    E --> F[Document AST]
    F --> G[Dependencies]
    F --> H[Entities]
    F --> I[Summarization]
    F --> J[Translation]
    F --> K[More...]
    
    style F fill:#e1f5ff
    style A fill:#fff3e0

Complete Pipeline

  1. Tokenization (English.Tokenizer) → Split text into tokens
  2. POS Tagging (English.POSTagger) → Assign grammatical categories
  3. Morphology (English.Morphology) → Lemmatization and features
  4. Phrase Parsing (English.PhraseParser) → Build NP, VP, PP structures
  5. Sentence Parsing (English.SentenceParser) → Detect clauses and structure
  6. Dependency Extraction (English.DependencyExtractor) → Grammatical relations
  7. Entity Recognition (English.EntityRecognizer) → Named entities
  8. Semantic Role Labeling (English.SemanticRoleLabeler) → Predicate-argument structure
  9. Coreference Resolution (English.CoreferenceResolver) → Link mentions
  10. Summarization (English.Summarizer) → Extract key sentences
  11. Question Answering (English.QuestionAnalyzer, English.AnswerExtractor) → Answer questions
  12. Text Classification (English.FeatureExtractor, English.TextClassifier) → Train and classify documents
  13. Information Extraction (English.RelationExtractor, English.EventExtractor, English.TemplateExtractor) → Extract structured information
  14. AST Rendering (Rendering.Text) → Convert AST back to natural language
  15. AST Utilities (Utils.Traversal, Utils.Query, Utils.Validator, Utils.Transform) → Traverse, query, validate, and transform trees
  16. Visualization (Rendering.Visualization, Rendering.PrettyPrint) → Export to DOT/JSON and debug output

Features

Phrase Structures

  • Noun Phrases (NP): Det? Adj* Noun PP* RelClause*
  • Verb Phrases (VP): Aux* Verb NP? PP* Adv*
  • Prepositional Phrases (PP): Prep NP
  • Relative Clauses: RelPron/RelAdv Clause

Sentence Types

  • Simple, Compound, Complex sentences
  • Coordination (and, or, but)
  • Subordination (because, although, if)
  • Relative clauses (who, which, that)

Dependencies (Universal Dependencies)

  • Core arguments: nsubj, obj, iobj
  • Modifiers: amod, advmod, det, case
  • Clausal: acl, advcl, mark
  • Coordination: conj, cc

Entity Types

  • Person, Organization, Place (GPE)
  • With confidence scores and multi-word support

Multi-Language Support

Nasty provides a language-agnostic architecture using Elixir behaviours, enabling support for multiple natural languages:

Supported Languages

  • English (Nasty.Language.English) - Fully implemented
  • Spanish (Nasty.Language.Spanish) - Fully implemented
    • Spanish-specific tokenization (¿?, ¡!, contractions del/al, accented characters)
    • Spanish morphology (verb conjugations, gender/number agreement)
    • Complete NLP pipeline (tokenization → parsing → summarization)
  • Catalan (Nasty.Language.Catalan) - Fully implemented (Phases 1-7)
    • Catalan-specific tokenization (interpunct l·l, apostrophe contractions, 10 diacritics)
    • Catalan morphology (3 verb classes, irregular verbs, gender/number agreement)
    • Full parsing pipeline (phrase/sentence parsing, dependency extraction, NER)

Usage

alias Nasty.Language.Spanish

# Spanish text processing
text = "El gato duerme en el sofá."
{:ok, tokens} = Spanish.tokenize(text)
{:ok, tagged} = Spanish.tag_pos(tokens)
{:ok, document} = Spanish.parse(tagged)

# Works identically to English
summary = Spanish.summarize(document, ratio: 0.3)
{:ok, entities} = Spanish.extract_entities(document)

# Catalan text processing
alias Nasty.Language.Catalan

text_ca = "El gat dorm al sofà."
{:ok, tokens_ca} = Catalan.tokenize(text_ca)
{:ok, tagged_ca} = Catalan.tag_pos(tokens_ca)
{:ok, document_ca} = Catalan.parse(tagged_ca)

# Extract entities (Catalan-specific lexicons)
alias Nasty.Language.Catalan.EntityRecognizer
{:ok, entities_ca} = EntityRecognizer.recognize(tagged_ca)

# Translate between languages (AST-based)
alias Nasty.Translation.Translator

# English to Spanish
{:ok, tokens_en} = English.tokenize("The quick cat runs.")
{:ok, tagged_en} = English.tag_pos(tokens_en)
{:ok, doc_en} = English.parse(tagged_en)
{:ok, doc_es} = Translator.translate_document(doc_en, :es)
{:ok, text_es} = Nasty.render(doc_es)
# => "El gato rápido corre."

# Spanish to English
{:ok, tokens_es} = Spanish.tokenize("La casa grande.")
{:ok, tagged_es} = Spanish.tag_pos(tokens_es)
{:ok, doc_es} = Spanish.parse(tagged_es)
{:ok, doc_en} = Translator.translate_document(doc_es, :en)
{:ok, text_en} = Nasty.render(doc_en)
# => "The big house."

Language Registry

All languages are registered in Nasty.Language.Registry and can be accessed dynamically:

# Auto-detect language
{:ok, lang} = Nasty.Language.Registry.detect_language("¿Cómo estás?")
# => :es

# Get language module
{:ok, Spanish} = Nasty.Language.Registry.get(:es)

See complete language-specific examples:

  • examples/spanish_example.exs - Spanish NLP pipeline demonstration
  • examples/catalan_example.exs - Catalan tokenization, morphology, and parsing
  • examples/roundtrip_translation.exs - Translation quality analysis with roundtrip testing
  • examples/multilingual_pipeline.exs - Side-by-side comparison of English/Spanish/Catalan

Text Summarization

  • Extractive summarization - Select important sentences from document
  • Multiple scoring features:
    • Position weight (early sentences score higher)
    • Entity density (sentences with named entities)
    • Discourse markers ("in conclusion", "importantly", etc.)
    • Keyword frequency (TF scoring)
    • Sentence length (prefer moderate length)
    • Coreference participation (sentences in coref chains)
  • Selection methods:
    • :greedy - Top-N by score (default)
    • :mmr - Maximal Marginal Relevance (reduces redundancy)
  • Flexible options: compression ratio or fixed sentence count

Question Answering

  • Extractive QA - Extract answer spans from documents
  • Question classification:
    • WHO (person entities)
    • WHAT (things, organizations)
    • WHEN (temporal expressions)
    • WHERE (locations)
    • WHY (reasons, clauses)
    • HOW (manner, quantity)
    • YES/NO (boolean questions)
  • Answer extraction strategies:
    • Keyword matching with lemmatization
    • Entity type filtering (person, organization, location)
    • Temporal expression recognition
    • Confidence scoring and ranking
  • Multiple answer support with confidence scores

Text Classification

  • Multinomial Naive Bayes - Probabilistic classifier with Laplace smoothing
  • Multiple feature types:
    • :bow - Bag of words (lemmatized, stop word filtering)
    • :ngrams - Word sequences (bigrams, trigrams, etc.)
    • :pos_patterns - POS tag sequences
    • :syntactic - Sentence structure statistics
    • :entities - Named entity distributions
    • :lexical - Vocabulary richness and text statistics
  • Training and prediction:
    • Train on labeled documents: {document, class} tuples
    • Multi-class classification support
    • Confidence scores and probability distributions
  • Model evaluation:
    • Accuracy, precision, recall, F1 metrics
    • Per-class performance breakdowns
  • Use cases:
    • Sentiment analysis (positive/negative reviews)
    • Spam detection (spam/ham classification)
    • Topic classification (sports, tech, politics, etc.)
    • Formality detection (formal/informal text)

Information Extraction

  • Relation Extraction - Extract semantic relationships between entities

    • Supported relations:
      • Employment: works_at, employed_by, member_of
      • Organization: founded, acquired_by, subsidiary_of
      • Location: located_in, based_in, headquarters_in
      • Temporal: occurred_on, founded_in
    • Pattern-based extraction using verb patterns and prepositions
    • Confidence scoring (0.5-0.8 based on pattern strength)
    • Integrates with NER and dependency parsing
  • Event Extraction - Identify events with triggers and participants

    • Event types:
      • Business: business_acquisition, business_merger, company_founding, product_launch
      • Employment: employment_start, employment_end
      • Communication: announcement, meeting
      • Other: movement, transaction
    • Verb and nominalization triggers
    • Participant extraction using semantic role labeling
    • Temporal expression linking
    • Confidence scoring (0.7-0.8)
  • Template-Based Extraction - Structured information using custom templates

    • Define extraction templates with typed slots
    • Pre-defined templates: employment, acquisition, location
    • Flexible pattern matching
    • Required/optional slot support
    • Confidence based on slot fill rate
  • API Functions:

    # Extract relations
    {:ok, relations} = English.extract_relations(document, min_confidence: 0.6)
    
    # Extract events
    {:ok, events} = English.extract_events(document, max_events: 10)
    
    # Template extraction
    templates = [TemplateExtractor.employment_template()]
    {:ok, results} = English.extract_templates(document, templates)

Code Interoperability

Convert between natural language and Elixir code bidirectionally:

  • NL → Code Generation - Convert natural language commands to executable Elixir code

    • List operations: "Sort the numbers" → Enum.sort(numbers)
    • Filtering: "Filter users where age > 18" → Enum.filter(users, fn item -> item > 18 end)
    • Mapping: "Map the list" → Enum.map(list, fn item -> item end)
    • Arithmetic: "X plus Y" → x + y
    • Assignments: "X is 5" → x = 5
    • Conditionals: "If X then Y" → if x, do: y
  • Code → NL Explanation - Generate natural language explanations from code

    • Enum.sort(numbers) → "sort numbers"
    • x = a + b → "X is a plus b"
    • if x > 5, do: :ok → "If x is greater than 5, then :ok"
    • Pipeline support: list |> Enum.map(&(&1 * 2)) |> Enum.sum() → "map list to each element times 2, then sum list"
  • API Functions:

    # Natural language → Code
    {:ok, code} = English.to_code("Sort the numbers")
    # => "Enum.sort(numbers)"
    
    # Code → Natural language
    {:ok, explanation} = English.explain_code("Enum.filter(users, fn u -> u.age > 18 end)")
    # => "filter users where u u age is greater than 18"
    
    # Get intent without generating code
    {:ok, intent} = English.recognize_intent("Filter the users")
    # => %Intent{type: :action, action: "filter", target: "users", confidence: 0.95}
    
    # Optional: Enhance with Ragex for context-aware suggestions
    {:ok, code} = English.to_code("Sort the list", enhance_with_ragex: true)
  • Example Scripts:

    • examples/code_generation.exs - Natural language to code demos
    • examples/code_explanation.exs - Code to natural language demos

AST Rendering & Utilities

Convert AST back to text, traverse and query trees, validate structures, and export visualizations:

  • Text Rendering - Regenerate natural language from AST

    alias Nasty.Rendering.Text
    
    # Render AST to text
    {:ok, text} = Text.render(document)
    # => "The cat sat on the mat."
    
    # Custom rendering options
    {:ok, text} = Text.render(document, 
      capitalize_sentences: false,
      add_punctuation: false,
      paragraph_separator: "\n\n"
    )
  • AST Traversal - Walk the tree with visitor pattern

    alias Nasty.Utils.Traversal
    
    # Count all tokens
    token_count = Traversal.reduce(document, 0, fn
      %Token{}, acc -> acc + 1
      _, acc -> acc
    end)
    
    # Collect all nouns
    nouns = Traversal.collect(document, fn
      %Token{pos_tag: :noun} -> true
      _ -> false
    end)
    
    # Transform tree (lowercase all text)
    lowercased = Traversal.map(document, fn
      %Token{} = token -> %{token | text: String.downcase(token.text)}
      node -> node
    end)
  • AST Queries - High-level query API

    alias Nasty.Utils.Query
    
    # Find all noun phrases
    noun_phrases = Query.find_all(document, :noun_phrase)
    
    # Find tokens by POS tag
    verbs = Query.find_by_pos(document, :verb)
    
    # Extract entities
    people = Query.extract_entities(document, type: :PERSON)
    
    # Find sentence subject
    subject = Query.find_subject(sentence)
    
    # Count nodes
    token_count = Query.count(document, :token)
  • Pretty Printing - Debug AST structures

    alias Nasty.Rendering.PrettyPrint
    
    # Indented output
    IO.puts(PrettyPrint.print(document, color: true))
    
    # Tree-style output with box characters
    IO.puts(PrettyPrint.tree(document))
    
    # Statistics
    IO.puts(PrettyPrint.stats(document))
    # => AST Statistics:
    #      Paragraphs: 3
    #      Sentences: 12
    #      Tokens: 127
  • Visualization - Export for graphical rendering

    alias Nasty.Rendering.Visualization
    
    # Export to DOT format (Graphviz)
    dot = Visualization.to_dot(document, type: :parse_tree)
    File.write("tree.dot", dot)
    # Then: dot -Tpng tree.dot -o tree.png
    
    # Dependency graph
    deps_dot = Visualization.to_dot(sentence, type: :dependencies)
    
    # Entity graph
    entity_dot = Visualization.to_dot(document, type: :entities)
    
    # JSON export for d3.js
    json = Visualization.to_json(document)
  • Validation - Ensure AST integrity

    alias Nasty.Utils.Validator
    
    # Validate structure
    {:ok, document} = Validator.validate(document)
    
    # Check spans
    :ok = Validator.validate_spans(document)
    
    # Check language consistency
    :ok = Validator.validate_language(document)
  • Transformations - Modify AST structures

    alias Nasty.Utils.Transform
    
    # Normalize case
    lowercased = Transform.normalize_case(document, :lower)
    
    # Remove punctuation
    no_punct = Transform.remove_punctuation(document)
    
    # Remove stop words
    no_stops = Transform.remove_stop_words(document)
    
    # Lemmatize all tokens
    lemmatized = Transform.lemmatize(document)
    
    # Apply pipeline of transformations
    processed = Transform.pipeline(document, [
      &Transform.normalize_case(&1, :lower),
      &Transform.remove_punctuation/1,
      &Transform.remove_stop_words/1
    ])

Testing

# Run all tests
mix test

# Run specific module tests
mix test test/language/english/tokenizer_test.exs
mix test test/language/english/phrase_parser_test.exs
mix test test/language/english/dependency_extractor_test.exs

Documentation

Comprehensive documentation is available in the docs/ directory:

Getting Started

Core Documentation

  • PLAN.md - Original vision and architectural design
  • TODO.md - Unimplemented features and future enhancements
  • PARSING_GUIDE.md - Complete parsing algorithms reference (tokenization, POS tagging, morphology, phrase/sentence parsing, dependencies)
  • ARCHITECTURE.md - System architecture and design patterns
  • USER_GUIDE.md - User guide with examples and API reference
  • API.md - Complete API reference for all modules
  • AST_REFERENCE.md - Complete AST node reference
  • PERFORMANCE.md - Benchmarks, optimization tips, and performance considerations

Language-Specific Documentation

Statistical & Neural Models

Nasty includes comprehensive statistical and neural network models for state-of-the-art NLP:

Statistical Models

Sequence Labeling

  • HMM POS Tagger: Hidden Markov Model with Viterbi decoding (~95% accuracy)
  • CRF (Conditional Random Fields): Feature-based sequence labeling
    • Named Entity Recognition
    • POS tagging
    • Chunking and segmentation
    • Forward-backward algorithm for training
    • Viterbi decoding for prediction
    • Multiple optimization methods (SGD, Momentum, AdaGrad)

Parsing

  • PCFG (Probabilistic Context-Free Grammar): Statistical phrase structure parsing
    • CYK algorithm for efficient parsing
    • Grammar learning from treebanks
    • Chomsky Normal Form (CNF) conversion
    • Smoothing and probability estimation
    • Beam search for pruning

Classification

  • Naive Bayes Classifier: Fast text classification
    • Multiple feature types (BOW, n-grams, POS patterns)
    • Laplace smoothing
    • Multi-class support

Neural Models

  • BiLSTM-CRF: Bidirectional LSTM with CRF for sequence tagging (97-98% accuracy)
  • Axon/EXLA: Pure Elixir neural networks with GPU acceleration
  • Pre-trained embeddings: Support for GloVe, FastText
  • Training infrastructure: Train custom models on your own data
  • Evaluation metrics: Accuracy, precision, recall, F1, confusion matrices

Transformer Models (Bumblebee Integration)

  • Pre-trained Models: BERT, RoBERTa, DistilBERT, XLM-RoBERTa via Hugging Face
  • Fine-tuning: Full fine-tuning pipeline for POS tagging and NER (98-99% accuracy)
  • Zero-shot Classification: Classify without training using NLI models (70-85% accuracy)
  • Model Quantization: INT8 quantization for 4x compression and 2-3x speedup
  • Multilingual Support: XLM-RoBERTa for cross-lingual transfer
  • Mix Tasks: CLI tools for model management, fine-tuning, and inference

See Statistical Models for complete reference, Neural Models for neural architecture details, Training Neural for training guide, Pretrained Models for transformer usage, Zero Shot for zero-shot classification, and Quantization for model optimization.

Quick Start: Model Management

# List available models
mix nasty.models list

# Train HMM POS tagger (fast, 95% accuracy)
mix nasty.train.pos \
  --corpus data/UD_English-EWT/en_ewt-ud-train.conllu \
  --test data/UD_English-EWT/en_ewt-ud-test.conllu \
  --output priv/models/en/pos_hmm_v1.model

# Train neural POS tagger (slower, 97-98% accuracy)
mix nasty.train.neural_pos \
  --corpus data/UD_English-EWT/en_ewt-ud-train.conllu \
  --output priv/models/en/pos_neural_v1.axon \
  --epochs 10 \
  --batch-size 32

# Train CRF for NER
mix nasty.train.crf \
  --corpus data/train.conllu \
  --test data/test.conllu \
  --output priv/models/en/ner_crf.model \
  --task ner \
  --iterations 100

# Train PCFG parser
mix nasty.train.pcfg \
  --corpus data/en_ewt-ud-train.conllu \
  --test data/en_ewt-ud-test.conllu \
  --output priv/models/en/pcfg.model \
  --smoothing 0.001

# Evaluate models
mix nasty.eval.pos \
  --model priv/models/en/pos_hmm_v1.model \
  --test data/UD_English-EWT/en_ewt-ud-test.conllu \
  --baseline

mix nasty.eval \
  --model priv/models/en/ner_crf.model \
  --test data/test.conllu \
  --type crf \
  --task ner

mix nasty.eval \
  --model priv/models/en/pcfg.model \
  --test data/test.conllu \
  --type pcfg

Future Enhancements

  • [x] Statistical models for improved accuracy (HMM POS tagger - done!)
  • [x] Neural models (BiLSTM-CRF POS tagger with 97-98% accuracy - done!)
  • [x] PCFG parser for phrase structure (done!)
  • [x] CRF for named entity recognition (done!)
  • [x] Semantic role labeling (rule-based SRL - done!)
  • [x] Coreference resolution (heuristic-based - done!)
  • [x] Question answering (extractive QA - done!)
  • [x] Information extraction (relations, events, templates - done!)
  • [x] Code ↔ NL bidirectional conversion (done!)
  • [x] Pre-trained transformers (BERT, RoBERTa via Bumblebee - done!)
  • [x] Fine-tuning infrastructure for POS tagging and NER (done!)
  • [x] Zero-shot classification using NLI models (done!)
  • [x] Model quantization (INT8 with 4x compression) (done!)
  • [x] Integration of PCFG/CRF with main pipeline (done!)
  • [x] Multi-language support - Spanish and Catalan complete
  • [ ] Advanced coreference (neural models)

License

MIT License — see LICENSE file for details.


Built with ❤️ using Elixir and NimbleParsec