# `Text.Sentiment.Lexicon`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/sentiment/lexicon.ex#L1)

Lexicon-based sentiment scoring.

Scores a piece of text by looking up each token in a polarity lexicon
(a map from `t:String.t/0` to a numeric score), summing the matched
scores, and optionally adjusting for nearby negators and intensifiers.

This module is the deterministic engine underneath `Text.Sentiment`.
Most callers use the higher-level facade; this one is exposed for
callers who want to plug in a custom lexicon (an industry-specific
vocabulary, a non-AFINN translation, an emoji lexicon, etc.).

### Score semantics

The default English lexicon (AFINN-165) uses integer scores in
`-5..+5`, with `0` reserved for neutral terms. Sums of these are
unbounded; the engine returns the raw sum plus a normalised
compound score in `[-1.0, +1.0]` derived via the formula

    compound = sum / sqrt(sum² + α)

with `α = 15` (matching VADER's normalisation). This tames the
range without saturating too quickly: a sum of `5` yields about
`+0.79`, a sum of `15` yields about `+0.97`, and a sum of `0`
yields exactly `0.0`.

### Negation and intensifier handling

Two simple, well-understood adjustments are applied during scoring:

* **Negation**: when one of the configured negation tokens (`"not"`,
  `"never"`, `"no"`, etc.) appears in the `:negation_window` tokens
  immediately preceding a polarity-bearing token, that token's score
  is multiplied by `-0.74` (the VADER scalar). This is a deliberate
  over-correction that captures the intuition that negation usually
  flips polarity but rarely with full magnitude.

* **Intensifiers**: when one of the configured intensifier tokens
  (`"very"`, `"extremely"`, etc.) immediately precedes a
  polarity-bearing token, that token's score is multiplied by `1.293`.
  Diminishers (`"slightly"`, `"barely"`) similarly multiply by
  `0.707`. Both scalars come from VADER and are tunable via
  `:intensifier_boost` and `:diminisher_factor`.

These rules are deliberately limited — they don't handle multi-word
negation, sarcasm, or domain-specific reversals. For higher-quality
multilingual sentiment, see the planned Bumblebee-backed adapter.

# `lexicon`

```elixir
@type lexicon() :: %{required(String.t()) =&gt; number()}
```

A polarity lexicon: token → numeric score.

# `result`

```elixir
@type result() :: %{
  sum: float(),
  compound: float(),
  label: :positive | :negative | :neutral,
  tokens: non_neg_integer(),
  matched: non_neg_integer()
}
```

The structured result returned by `score/3`.

# `score`

```elixir
@spec score(String.t(), lexicon(), keyword()) :: result()
```

Scores `text` against `lexicon`.

### Arguments

* `text` is a UTF-8 string.

* `lexicon` is a map from token to numeric score. Tokens are matched
  after the same case-folding the engine applies to `text` (lowercase
  by default; see `:fold_case`).

### Options

* `:tokenizer` — a one-arg function from string to token list.
  Defaults to `&Text.Segment.words/1`.

* `:fold_case` — `true` (default) lowercases tokens before lookup.
  Set `false` if your lexicon is case-sensitive.

* `:negators` — a list of tokens that, when seen in the
  `:negation_window` tokens preceding a polarity-bearing token, flip
  its score. Defaults to a small set of English negators
  (`"not"`, `"never"`, `"no"`, `"none"`, `"nobody"`, `"nor"`,
  `"neither"`, `"cannot"`, `"can't"`, `"don't"`, `"isn't"`, `"won't"`,
  `"wasn't"`).

* `:intensifiers` — a list of tokens that, when immediately
  preceding a polarity-bearing token, boost its score. Defaults to a
  small set of English intensifiers.

* `:diminishers` — a list of tokens that, when immediately preceding
  a polarity-bearing token, dampen its score. Defaults to a small
  set of English diminishers.

* `:negation_window` — how many preceding tokens to scan for a
  negator. Defaults to `3`.

* `:negation_scalar` — multiplier applied when a negator is found.
  Defaults to `-0.74`.

* `:intensifier_boost` — multiplier applied when an intensifier is
  found. Defaults to `1.293`.

* `:diminisher_factor` — multiplier applied when a diminisher is
  found. Defaults to `0.707`.

* `:positive_threshold`, `:negative_threshold` — compound-score
  cutoffs for the `:label` field. Defaults to `0.05` and `-0.05`.

### Returns

A `t:result/0` struct with:

* `:sum` — the raw sum of matched (and adjusted) lexicon scores.

* `:compound` — the normalised score in `[-1.0, +1.0]`.

* `:label` — `:positive`, `:negative`, or `:neutral` based on the
  threshold cutoffs.

* `:tokens` — total token count after tokenisation.

* `:matched` — number of tokens that hit the lexicon.

### Examples

    iex> lexicon = %{"good" => 3, "bad" => -3, "great" => 4}
    iex> result = Text.Sentiment.Lexicon.score("This is a good day", lexicon)
    iex> result.label
    :positive

    iex> lexicon = %{"good" => 3, "bad" => -3}
    iex> result = Text.Sentiment.Lexicon.score("not a bad outcome", lexicon)
    iex> result.label
    :positive

---

*Consult [api-reference.md](api-reference.md) for complete listing*
