ExFinalFusion (ex_final_fusion v0.1.2)

ExFinalFusion is an Elixir binding to the Rust crate. finalfusion Finalfusion is a file format for word embeddings, along with an associated set of libraries and utilities.

from the crate documentation: finalfusion supports a variety of formats:

  • Vocabulary
    • Subwords
    • No subwords
  • Storage
    • Array
    • Memory-mapped
    • Quantized
  • Format
    • finalfusion
    • fastText
    • floret
    • GloVe
    • word2vec

Moreover, finalfusion provides:

  • Similarity queries
  • Analogy queries
  • Quantizing embeddings through reductive
  • Conversion to the following formats:
    • finalfusion
    • word2vec
    • GloVe

final fusion file format

Project page

Train embeddings

Where to get models:

Summary

Types

It allows you to specify which function will be used to parse the embeddings file. You can find more information in the Rust crate documentation.

Options passed to the functions that search for embeddings

This specifies how to calculate the similarity type when returning similarities. This only changes the returned value, as cosine similarity is always used.

Functions

This function is similar to the analogy query, but it also allows for the removal of queried words from the results.

Returns the embedding of a word.

Returns a vector of embeddings for a word.

returns words that are similar to the query vector.

Returns the index of a word.

Returns the average embedding for the provided word and the fraction of how many words were included in the calculation.

Returns the metadata as a map or nil.

Functions Available on the Embeddings Module

Returns words that are similar to the query word.

Returns a list of words included in the embeddings.

Types

@type read_type() ::
  :floret_text
  | :embeddings
  | :mmap_embeddings
  | :fasttext
  | :fasttext_lossy
  | :text
  | :text_lossy
  | :text_dims
  | :text_dims_lossy
  | :word2vec_binary
  | :word2vec_binary_lossy
  | :fifu
  | :word2vec
  | :floret

It allows you to specify which function will be used to parse the embeddings file. You can find more information in the Rust crate documentation.

Link to this type

search_options()

@type search_options() :: [
  limit: integer(),
  batch_size: integer(),
  similarity_type: similarity_type(),
  skip: [String.t()]
]

Options passed to the functions that search for embeddings:

Default options:

  • limit: 1
  • batch_size: None (means all at once, but this is memory intensive)
  • similarity type: Cosine similarity
  • skip: []
Link to this type

similarity_type()

@type similarity_type() ::
  :cosine_similarity
  | :angular_similarity
  | :euclidean_similarity
  | :euclidean_distance

This specifies how to calculate the similarity type when returning similarities. This only changes the returned value, as cosine similarity is always used.

Functions

Link to this function

analogy(ref, word1, word2, word3, search_options \\ [])

@spec analogy(reference(), String.t(), String.t(), String.t(), search_options()) ::
  {:ok, [{String.t(), float()}]}

Returns the calculated analogy.

This method returns words that are close in vector space for the analogy query word1 is to word2 as word3 is to ?. More concretely, it searches embeddings that are similar to:

Link to this function

analogy_masked(ref, word_1, hide_1, word_2, hide_2, word_3, hide_3, search_options \\ [])

@spec analogy_masked(
  reference(),
  String.t(),
  bool(),
  String.t(),
  bool(),
  String.t(),
  bool(),
  search_options()
) :: {:ok, [{String.t(), float()}]}

This function is similar to the analogy query, but it also allows for the removal of queried words from the results.

@spec dims(reference()) :: [integer()]

See ExFinalFusion.Native.dims/1.

Link to this function

embedding(ref, word)

@spec embedding(reference(), String.t()) :: {:ok, [float()]}

Returns the embedding of a word.

Link to this function

embedding_batch(ref, list_of_words)

@spec embedding_batch(reference(), [String.t()]) :: {:ok, [[float()]]}

Returns a vector of embeddings for a word.

Link to this function

embedding_similarity(ref, query, search_params \\ [])

@spec embedding_similarity(reference(), [float()], Keyword.t()) ::
  {:ok, [{String.t(), float()}]}

returns words that are similar to the query vector.

@spec idx(reference(), String.t()) ::
  nil | {:word, [integer()]} | {:subword, [integer()]}

Returns the index of a word.

@spec len(reference()) :: integer()

See ExFinalFusion.Native.len/1.

Link to this function

mean_embedding_batch(ref, words)

@spec mean_embedding_batch(reference(), [String.t()]) :: {:ok, [[float()]], float()}

Returns the average embedding for the provided word and the fraction of how many words were included in the calculation.

@spec metadata(reference()) :: map() | nil

Returns the metadata as a map or nil.

Link to this function

read(path, model_type)

@spec read(String.t(), read_type()) :: reference()

Functions Available on the Embeddings Module

  • :floret_text,
  • :embeddings,
  • :mmap_embeddings,
  • :fasttext,
  • :fasttext_lossy,
  • :text,
  • :text_lossy,
  • :text_dims,
  • :text_dims_lossy,
  • :word2vec_binary,
  • :word2vec_binary_lossy,

Aliases

  • :fifu = :embeddings,
  • :word2vec = :word2vec_binary,
  • :floret = :floret_text
@spec vocab_len(reference()) :: integer()

See ExFinalFusion.Native.vocab_len/1.

Link to this function

word_similarity(ref, word, search_params \\ [])

@spec word_similarity(reference(), String.t(), search_options()) ::
  {:ok, [{String.t(), float()}]}

Returns words that are similar to the query word.

@spec words(reference()) :: [String.t()]

Returns a list of words included in the embeddings.

@spec words_len(reference()) :: integer()

See ExFinalFusion.Native.words_len/1.