Nasty.Language.Spanish (Nasty v0.3.0)
View SourceSpanish language implementation.
Provides full NLP pipeline for Spanish text:
- Tokenization (NimbleParsec-based with Spanish punctuation)
- POS tagging (rule-based with Universal Dependencies tags)
- Morphological analysis (lemmatization + features)
- Parsing (phrase and sentence structure)
- Semantic analysis (NER, coreference, SRL)
- NLP operations (summarization, QA, classification)
Summary
Functions
Answers a question based on a Spanish document.
Classifies a Spanish document using a trained model.
Extracts classification features from a Spanish document.
Performs semantic role labeling on a Spanish document.
Performs coreference resolution on a Spanish document.
Summarizes a Spanish document by extracting important sentences.
Trains a text classifier on labeled Spanish documents.
Functions
@spec answer_question(Nasty.AST.Document.t(), String.t(), keyword()) :: {:ok, [Nasty.AST.Answer.t()]} | {:error, term()}
Answers a question based on a Spanish document.
Options
:max_answers- Maximum number of answers to return (default: 3):min_confidence- Minimum confidence threshold (default: 0.3):max_answer_length- Maximum answer length in tokens (default: 20)
Examples
iex> {:ok, answers} = Spanish.answer_question(document, "¿Quién fundó Google?")
iex> is_list(answers)
true
@spec classify(Nasty.AST.Document.t(), Nasty.AST.ClassificationModel.t(), keyword()) :: {:ok, [Nasty.AST.Classification.t()]} | {:error, term()}
Classifies a Spanish document using a trained model.
Returns classifications sorted by confidence.
Examples
iex> {:ok, classifications} = Spanish.classify(document, model)
iex> [top | _rest] = classifications
iex> is_atom(top.class)
true
@spec extract_features( Nasty.AST.Document.t(), keyword() ) :: map()
Extracts classification features from a Spanish document.
Options
:features- Feature types (default:[:bow, :ngrams]):ngram_size- N-gram size (default: 2):min_frequency- Minimum frequency (default: 1)
Examples
iex> features = Spanish.extract_features(document)
iex> is_map(features)
true
@spec label_semantic_roles(Nasty.AST.Document.t()) :: {:ok, [Nasty.AST.Semantic.Frame.t()]} | {:error, term()}
Performs semantic role labeling on a Spanish document.
Extracts predicate-argument structure for all sentences.
Examples
iex> {:ok, frames} = Spanish.label_semantic_roles(document)
iex> is_list(frames)
true
@spec resolve_coreference(Nasty.AST.Document.t()) :: {:ok, [Nasty.AST.Semantic.CorefChain.t()]} | {:error, term()}
Performs coreference resolution on a Spanish document.
Links mentions (pronouns, proper names, definite NPs) into coreference chains.
Examples
iex> {:ok, chains} = Spanish.resolve_coreference(document)
iex> is_list(chains)
true
@spec summarize( Nasty.AST.Document.t(), keyword() ) :: [Nasty.AST.Sentence.t()]
Summarizes a Spanish document by extracting important sentences.
Options
:ratio- Compression ratio (0.0 to 1.0), default 0.3:max_sentences- Maximum number of sentences in summary:min_sentence_length- Minimum sentence length (in tokens):method- Selection method::greedyor:mmr(default::greedy):mmr_lambda- MMR diversity parameter, 0-1 (default: 0.5)
Examples
iex> summary = Spanish.summarize(document, max_sentences: 3)
iex> is_list(summary)
true
@spec train_classifier( [{Nasty.AST.Document.t(), atom()}], keyword() ) :: Nasty.AST.ClassificationModel.t()
Trains a text classifier on labeled Spanish documents.
Options
:features- Feature types to extract (default:[:bow]):smoothing- Smoothing parameter (default: 1.0):min_frequency- Minimum feature frequency (default: 2)
Examples
iex> training_data = [{spam_doc, :spam}, {ham_doc, :ham}]
iex> model = Spanish.train_classifier(training_data)
iex> model.algorithm
:naive_bayes