Nasty.Language.Spanish.FeatureExtractor (Nasty v0.3.0)

Extracts linguistic features from Spanish text for ML applications.

Provides feature vectors for:

Text classification
Similarity computation
Information retrieval
Clustering

Features

Lexical: word counts, n-grams, TF-IDF
Syntactic: POS tags, phrase structures
Semantic: entities, sentiment indicators
Statistical: sentence length, type-token ratio

Example

iex> doc = parse("El gato se sentó en la alfombra")
iex> features = FeatureExtractor.extract(doc)
%{
  word_count: 7,
  sentence_count: 1,
  avg_sentence_length: 7.0,
  noun_count: 2,
  verb_count: 1,
  entities: [:animal, :furniture],
  ...
}

Summary

Functions

extract(doc)

Extracts all features from a Spanish document.

Functions

extract(doc)

@spec extract(Nasty.AST.Document.t()) :: map()

Extracts all features from a Spanish document.

Returns a map of feature names to values.