Nasty.Lexical.WordNet.Storage (Nasty v0.3.0)

View Source

ETS-based in-memory storage for WordNet data with fast lookups.

This module manages ETS tables for synsets, lemmas, and relations with multiple indexes for efficient queries. Uses lazy loading to minimize memory footprint and startup time.

Storage Strategy

ETS Tables

  1. :wordnet_synsets_{lang} - Main synset storage

    • Key: synset_id
    • Value: Synset struct
    • Type: :set
  2. :wordnet_lemmas_{lang} - Lemma storage

    • Key: {word, pos, synset_id}
    • Value: Lemma struct
    • Type: :bag (multiple lemmas per word)
  3. :wordnet_word_index_{lang} - Word to synsets index

    • Key: {word, pos}
    • Value: [synset_ids]
    • Type: :bag
  4. :wordnet_relations_{lang} - Relation storage

    • Key: {type, source_id}
    • Value: target_id
    • Type: :bag (multiple relations per source)
  5. :wordnet_ili_index - Interlingual index (shared across languages)

    • Key: ili_id
    • Value: {lang, synset_id}
    • Type: :bag

Performance

  • Synset lookup by ID: O(1)
  • Lemmas by word: O(1)
  • Relations by source: O(1)
  • Memory: ~200MB for full OEWN, ~50MB for Spanish, ~40MB for Catalan

Example

# Initialize storage
Storage.init(:en)

# Store synsets
Storage.put_synset(synset, :en)

# Retrieve synsets
synset = Storage.get_synset(synset_id, :en)
synsets = Storage.get_synsets_for_word("dog", :noun, :en)

Summary

Functions

Clears all data for a language.

Gets all relations (of any type) from a source synset.

Finds synsets by Interlingual Index (ILI) across languages.

Gets all lemmas for a word (optionally filtered by POS).

Gets all target synset IDs for a given source and relation type.

Retrieves a synset by ID.

Gets all synset IDs for a word (fast index lookup).

Initializes ETS tables for a language.

Checks if a language's wordnet data is loaded.

Stores a lemma and updates word index.

Stores a relation between two synsets.

Stores a synset in the database.

Returns statistics about loaded wordnet data.

Types

language()

@type language() :: atom()

table_name()

@type table_name() :: atom()

Functions

clear(language)

@spec clear(language()) :: :ok

Clears all data for a language.

Useful for reloading or testing.

get_all_relations(source_id, language)

@spec get_all_relations(String.t(), language()) :: [
  {Nasty.Lexical.WordNet.Relation.relation_type(), String.t()}
]

Gets all relations (of any type) from a source synset.

get_by_ili(ili_id, target_lang)

@spec get_by_ili(String.t(), language() | :all) ::
  [{language(), String.t()}] | [Nasty.Lexical.WordNet.Synset.t()]

Finds synsets by Interlingual Index (ILI) across languages.

Returns synsets from specified language(s) that share the same ILI.

get_lemmas(word, pos \\ nil, language)

Gets all lemmas for a word (optionally filtered by POS).

get_relations(source_id, rel_type, language)

Gets all target synset IDs for a given source and relation type.

get_synset(synset_id, language)

@spec get_synset(String.t(), language()) :: Nasty.Lexical.WordNet.Synset.t() | nil

Retrieves a synset by ID.

get_synset_ids_for_word(word, pos \\ nil, language)

@spec get_synset_ids_for_word(
  String.t(),
  Nasty.Lexical.WordNet.Synset.pos_tag() | nil,
  language()
) :: [
  String.t()
]

Gets all synset IDs for a word (fast index lookup).

get_synsets_for_word(word, pos \\ nil, language)

@spec get_synsets_for_word(
  String.t(),
  Nasty.Lexical.WordNet.Synset.pos_tag() | nil,
  language()
) :: [
  Nasty.Lexical.WordNet.Synset.t()
]

Gets all synsets for a word.

Convenience function combining index lookup with synset retrieval.

init(language)

@spec init(language()) :: :ok

Initializes ETS tables for a language.

Creates all necessary tables if they don't exist. Safe to call multiple times.

Examples

iex> Storage.init(:en)
:ok

iex> Storage.init(:es)
:ok

loaded?(language)

@spec loaded?(language()) :: boolean()

Checks if a language's wordnet data is loaded.

put_lemma(lemma, language)

@spec put_lemma(Nasty.Lexical.WordNet.Lemma.t(), language()) :: :ok

Stores a lemma and updates word index.

put_relation(relation, language)

@spec put_relation(Nasty.Lexical.WordNet.Relation.t(), language()) :: :ok

Stores a relation between two synsets.

put_synset(synset, language)

@spec put_synset(Nasty.Lexical.WordNet.Synset.t(), language()) :: :ok

Stores a synset in the database.

Also updates ILI index if synset has an ILI.

stats(language)

@spec stats(language()) :: %{
  synsets: non_neg_integer(),
  lemmas: non_neg_integer(),
  relations: non_neg_integer()
}

Returns statistics about loaded wordnet data.