Nasty.Lexical.WordNet.Loader (Nasty v0.3.0)

View Source

Loads WordNet data from WN-LMF (Lexical Markup Framework) JSON files.

Parses Open English WordNet and Open Multilingual WordNet JSON files and populates ETS storage with synsets, lemmas, and relations.

WN-LMF Format

The WN-LMF format has two main sections:

  1. Lexical Entries - Words with their senses
  2. Synsets - Synonym sets with definitions, examples, and relations

Example

# Load English WordNet
Loader.load_from_file("priv/wordnet/oewn-2025.json", :en)

# Load Spanish WordNet
Loader.load_from_file("priv/wordnet/omw-es.json", :es)

Performance

  • Parsing: ~1-2 seconds for full OEWN (120K synsets)
  • ETS loading: ~1 second
  • Total: 2-3 seconds per language

Summary

Functions

Loads WordNet data from a JSON file.

Loads WordNet data from a JSON string.

Types

load_error()

@type load_error() :: {:error, term()}

load_result()

@type load_result() ::
  {:ok, %{synsets: integer(), lemmas: integer(), relations: integer()}}

Functions

load_from_file(file_path, language, opts \\ [])

@spec load_from_file(String.t(), atom(), keyword()) :: load_result() | load_error()

Loads WordNet data from a JSON file.

Parameters

  • file_path - Path to WN-LMF JSON file
  • language - Language code (:en, :es, :ca, etc.)
  • opts - Options
    • :clear - Clear existing data before loading (default: false)
    • :validate - Validate data integrity (default: true)

Returns

  • {:ok, stats} with counts of loaded items
  • {:error, reason} on failure

load_from_json(json_string, language, opts \\ [])

@spec load_from_json(String.t(), atom(), keyword()) :: load_result() | load_error()

Loads WordNet data from a JSON string.

Useful for testing or loading from external sources.