Nasty.Language.Spanish.FeatureExtractor (Nasty v0.3.0)

View Source

Extracts linguistic features from Spanish text for ML applications.

Provides feature vectors for:

  • Text classification
  • Similarity computation
  • Information retrieval
  • Clustering

Features

  • Lexical: word counts, n-grams, TF-IDF
  • Syntactic: POS tags, phrase structures
  • Semantic: entities, sentiment indicators
  • Statistical: sentence length, type-token ratio

Example

iex> doc = parse("El gato se sentó en la alfombra")
iex> features = FeatureExtractor.extract(doc)
%{
  word_count: 7,
  sentence_count: 1,
  avg_sentence_length: 7.0,
  noun_count: 2,
  verb_count: 1,
  entities: [:animal, :furniture],
  ...
}

Summary

Functions

Extracts all features from a Spanish document.

Functions

extract(doc)

@spec extract(Nasty.AST.Document.t()) :: map()

Extracts all features from a Spanish document.

Returns a map of feature names to values.