Lexicon-based sentiment scoring.
Scores a piece of text by looking up each token in a polarity lexicon
(a map from String.t/0 to a numeric score), summing the matched
scores, and optionally adjusting for nearby negators and intensifiers.
This module is the deterministic engine underneath Text.Sentiment.
Most callers use the higher-level facade; this one is exposed for
callers who want to plug in a custom lexicon (an industry-specific
vocabulary, a non-AFINN translation, an emoji lexicon, etc.).
Score semantics
The default English lexicon (AFINN-165) uses integer scores in
-5..+5, with 0 reserved for neutral terms. Sums of these are
unbounded; the engine returns the raw sum plus a normalised
compound score in [-1.0, +1.0] derived via the formula
compound = sum / sqrt(sum² + α)with α = 15 (matching VADER's normalisation). This tames the
range without saturating too quickly: a sum of 5 yields about
+0.79, a sum of 15 yields about +0.97, and a sum of 0
yields exactly 0.0.
Negation and intensifier handling
Two simple, well-understood adjustments are applied during scoring:
Negation: when one of the configured negation tokens (
"not","never","no", etc.) appears in the:negation_windowtokens immediately preceding a polarity-bearing token, that token's score is multiplied by-0.74(the VADER scalar). This is a deliberate over-correction that captures the intuition that negation usually flips polarity but rarely with full magnitude.Intensifiers: when one of the configured intensifier tokens (
"very","extremely", etc.) immediately precedes a polarity-bearing token, that token's score is multiplied by1.293. Diminishers ("slightly","barely") similarly multiply by0.707. Both scalars come from VADER and are tunable via:intensifier_boostand:diminisher_factor.
These rules are deliberately limited — they don't handle multi-word negation, sarcasm, or domain-specific reversals. For higher-quality multilingual sentiment, see the planned Bumblebee-backed adapter.
Summary
Types
A polarity lexicon: token → numeric score.
Functions
Scores text against lexicon.
Types
A polarity lexicon: token → numeric score.
@type result() :: %{ sum: float(), compound: float(), label: :positive | :negative | :neutral, tokens: non_neg_integer(), matched: non_neg_integer() }
The structured result returned by score/3.
Functions
Scores text against lexicon.
Arguments
textis a UTF-8 string.lexiconis a map from token to numeric score. Tokens are matched after the same case-folding the engine applies totext(lowercase by default; see:fold_case).
Options
:tokenizer— a one-arg function from string to token list. Defaults to&Text.Segment.words/1.:fold_case—true(default) lowercases tokens before lookup. Setfalseif your lexicon is case-sensitive.:negators— a list of tokens that, when seen in the:negation_windowtokens preceding a polarity-bearing token, flip its score. Defaults to a small set of English negators ("not","never","no","none","nobody","nor","neither","cannot","can't","don't","isn't","won't","wasn't").:intensifiers— a list of tokens that, when immediately preceding a polarity-bearing token, boost its score. Defaults to a small set of English intensifiers.:diminishers— a list of tokens that, when immediately preceding a polarity-bearing token, dampen its score. Defaults to a small set of English diminishers.:negation_window— how many preceding tokens to scan for a negator. Defaults to3.:negation_scalar— multiplier applied when a negator is found. Defaults to-0.74.:intensifier_boost— multiplier applied when an intensifier is found. Defaults to1.293.:diminisher_factor— multiplier applied when a diminisher is found. Defaults to0.707.:positive_threshold,:negative_threshold— compound-score cutoffs for the:labelfield. Defaults to0.05and-0.05.
Returns
A result/0 struct with:
:sum— the raw sum of matched (and adjusted) lexicon scores.:compound— the normalised score in[-1.0, +1.0].:label—:positive,:negative, or:neutralbased on the threshold cutoffs.:tokens— total token count after tokenisation.:matched— number of tokens that hit the lexicon.
Examples
iex> lexicon = %{"good" => 3, "bad" => -3, "great" => 4}
iex> result = Text.Sentiment.Lexicon.score("This is a good day", lexicon)
iex> result.label
:positive
iex> lexicon = %{"good" => 3, "bad" => -3}
iex> result = Text.Sentiment.Lexicon.score("not a bad outcome", lexicon)
iex> result.label
:positive