Nasty.Language.Spanish (Nasty v0.3.0)

View Source

Spanish language implementation.

Provides full NLP pipeline for Spanish text:

  1. Tokenization (NimbleParsec-based with Spanish punctuation)
  2. POS tagging (rule-based with Universal Dependencies tags)
  3. Morphological analysis (lemmatization + features)
  4. Parsing (phrase and sentence structure)
  5. Semantic analysis (NER, coreference, SRL)
  6. NLP operations (summarization, QA, classification)

Summary

Functions

Answers a question based on a Spanish document.

Classifies a Spanish document using a trained model.

Extracts classification features from a Spanish document.

Performs semantic role labeling on a Spanish document.

Performs coreference resolution on a Spanish document.

Summarizes a Spanish document by extracting important sentences.

Trains a text classifier on labeled Spanish documents.

Functions

answer_question(document, question_text, opts \\ [])

@spec answer_question(Nasty.AST.Document.t(), String.t(), keyword()) ::
  {:ok, [Nasty.AST.Answer.t()]} | {:error, term()}

Answers a question based on a Spanish document.

Options

  • :max_answers - Maximum number of answers to return (default: 3)
  • :min_confidence - Minimum confidence threshold (default: 0.3)
  • :max_answer_length - Maximum answer length in tokens (default: 20)

Examples

iex> {:ok, answers} = Spanish.answer_question(document, "¿Quién fundó Google?")
iex> is_list(answers)
true

classify(document, model, opts \\ [])

Classifies a Spanish document using a trained model.

Returns classifications sorted by confidence.

Examples

iex> {:ok, classifications} = Spanish.classify(document, model)
iex> [top | _rest] = classifications
iex> is_atom(top.class)
true

extract_features(document, opts \\ [])

@spec extract_features(
  Nasty.AST.Document.t(),
  keyword()
) :: map()

Extracts classification features from a Spanish document.

Options

  • :features - Feature types (default: [:bow, :ngrams])
  • :ngram_size - N-gram size (default: 2)
  • :min_frequency - Minimum frequency (default: 1)

Examples

iex> features = Spanish.extract_features(document)
iex> is_map(features)
true

label_semantic_roles(document)

@spec label_semantic_roles(Nasty.AST.Document.t()) ::
  {:ok, [Nasty.AST.Semantic.Frame.t()]} | {:error, term()}

Performs semantic role labeling on a Spanish document.

Extracts predicate-argument structure for all sentences.

Examples

iex> {:ok, frames} = Spanish.label_semantic_roles(document)
iex> is_list(frames)
true

resolve_coreference(document)

@spec resolve_coreference(Nasty.AST.Document.t()) ::
  {:ok, [Nasty.AST.Semantic.CorefChain.t()]} | {:error, term()}

Performs coreference resolution on a Spanish document.

Links mentions (pronouns, proper names, definite NPs) into coreference chains.

Examples

iex> {:ok, chains} = Spanish.resolve_coreference(document)
iex> is_list(chains)
true

summarize(document, opts \\ [])

@spec summarize(
  Nasty.AST.Document.t(),
  keyword()
) :: [Nasty.AST.Sentence.t()]

Summarizes a Spanish document by extracting important sentences.

Options

  • :ratio - Compression ratio (0.0 to 1.0), default 0.3
  • :max_sentences - Maximum number of sentences in summary
  • :min_sentence_length - Minimum sentence length (in tokens)
  • :method - Selection method: :greedy or :mmr (default: :greedy)
  • :mmr_lambda - MMR diversity parameter, 0-1 (default: 0.5)

Examples

iex> summary = Spanish.summarize(document, max_sentences: 3)
iex> is_list(summary)
true

train_classifier(training_data, opts \\ [])

@spec train_classifier(
  [{Nasty.AST.Document.t(), atom()}],
  keyword()
) :: Nasty.AST.ClassificationModel.t()

Trains a text classifier on labeled Spanish documents.

Options

  • :features - Feature types to extract (default: [:bow])
  • :smoothing - Smoothing parameter (default: 1.0)
  • :min_frequency - Minimum feature frequency (default: 2)

Examples

iex> training_data = [{spam_doc, :spam}, {ham_doc, :ham}]
iex> model = Spanish.train_classifier(training_data)
iex> model.algorithm
:naive_bayes