Nasty.Language.Spanish.FeatureExtractor (Nasty v0.3.0)
View SourceExtracts linguistic features from Spanish text for ML applications.
Provides feature vectors for:
- Text classification
- Similarity computation
- Information retrieval
- Clustering
Features
- Lexical: word counts, n-grams, TF-IDF
- Syntactic: POS tags, phrase structures
- Semantic: entities, sentiment indicators
- Statistical: sentence length, type-token ratio
Example
iex> doc = parse("El gato se sentó en la alfombra")
iex> features = FeatureExtractor.extract(doc)
%{
word_count: 7,
sentence_count: 1,
avg_sentence_length: 7.0,
noun_count: 2,
verb_count: 1,
entities: [:animal, :furniture],
...
}
Summary
Functions
Extracts all features from a Spanish document.
Functions
@spec extract(Nasty.AST.Document.t()) :: map()
Extracts all features from a Spanish document.
Returns a map of feature names to values.