Kanta.PoFiles.Services.StaleDetection (kanta v0.5.1)

Service for detecting stale translation messages and finding potential replacements.

A message is "stale" when it exists in the database but is missing from ALL locale PO files. This service identifies these stale messages system-wide and uses fuzzy matching (Jaro distance algorithm) to suggest active messages that could replace them.

Stale Detection Strategy

Uses a system-wide approach that treats messages globally rather than per-locale:

  1. Extracts all message keys from PO files across all locales
  2. Compares database messages against this global set of active keys
  3. Messages not found in ANY locale's PO files are marked as stale
  4. Fuzzy matching finds similar active messages within the same domain/context

This global approach ensures cross-locale consistency and simplifies migration when message keys change across the entire application.

Fuzzy Matching

When stale messages are detected, the service automatically searches for similar active messages using String.jaro_distance (0.0 = no match, 1.0 = identical). Matches are scoped by domain and context for relevance, with a default threshold of 0.8 for determining viable replacements.

Usage

# Detect stale messages with default settings
StaleDetection.call()

# Use custom PO file path and threshold
StaleDetection.call(base_path: "/path/to/gettext", fuzzy_threshold: 0.9)

Returns

Returns {:ok, %Result{}} containing:

  • stale_message_ids - MapSet of stale message IDs
  • fuzzy_matches_map - Map of stale_message_id => %FuzzyMatch{}
  • stale_count - Total number of stale messages
  • mergeable_count - Number of stale messages with fuzzy matches above threshold

Each FuzzyMatch struct contains:

  • stale_message_id - ID of the stale message
  • matched_message_id - ID of the active message that matches
  • matched_msgid - The msgid string of the matched message
  • similarity_score - Jaro distance score (0.0-1.0)

Summary

Functions

Identifies stale translation messages system-wide.

Functions

call(opts \\ [])

Identifies stale translation messages system-wide.

A message is considered "stale" if it doesn't exist in ANY locale's PO files.

Options

  • :base_path - Base directory to search for PO files (optional)
  • :fuzzy_threshold - Similarity threshold 0.0-1.0 (default: 0.8)

Returns

{:ok, %Result{}} - A struct containing:

  • :stale_message_ids - MapSet of message IDs that are stale
  • :fuzzy_matches_map - Map of stale_message_id => %FuzzyMatch{}
  • :stale_count - Total number of stale messages
  • :mergeable_count - Number of stale messages with fuzzy matches above threshold