ExFinalFusion (ex_final_fusion v0.1.2)
ExFinalFusion is an Elixir binding to the Rust crate. finalfusion Finalfusion is a file format for word embeddings, along with an associated set of libraries and utilities.
from the crate documentation: finalfusion supports a variety of formats:
- Vocabulary
- Subwords
- No subwords
- Storage
- Array
- Memory-mapped
- Quantized
- Format
- finalfusion
- fastText
- floret
- GloVe
- word2vec
Moreover, finalfusion provides:
- Similarity queries
- Analogy queries
- Quantizing embeddings through reductive
- Conversion to the following formats:
- finalfusion
- word2vec
- GloVe
Where to get models:
Summary
Types
It allows you to specify which function will be used to parse the embeddings file. You can find more information in the Rust crate documentation.
Options passed to the functions that search for embeddings
This specifies how to calculate the similarity type when returning similarities. This only changes the returned value, as cosine similarity is always used.
Functions
Returns the calculated analogy.
This function is similar to the analogy query, but it also allows for the removal of queried words from the results.
Returns the embedding of a word.
Returns a vector of embeddings for a word.
returns words that are similar to the query vector.
Returns the index of a word.
Returns the average embedding for the provided word and the fraction of how many words were included in the calculation.
Returns the metadata as a map or nil.
Functions Available on the Embeddings Module
Returns words that are similar to the query word.
Returns a list of words included in the embeddings.
Types
read_type()
@type read_type() ::
:floret_text
| :embeddings
| :mmap_embeddings
| :fasttext
| :fasttext_lossy
| :text
| :text_lossy
| :text_dims
| :text_dims_lossy
| :word2vec_binary
| :word2vec_binary_lossy
| :fifu
| :word2vec
| :floret
It allows you to specify which function will be used to parse the embeddings file. You can find more information in the Rust crate documentation.
search_options()
@type search_options() :: [ limit: integer(), batch_size: integer(), similarity_type: similarity_type(), skip: [String.t()] ]
Options passed to the functions that search for embeddings:
Default options:
- limit: 1
- batch_size: None (means all at once, but this is memory intensive)
- similarity type: Cosine similarity
- skip: []
similarity_type()
@type similarity_type() ::
:cosine_similarity
| :angular_similarity
| :euclidean_similarity
| :euclidean_distance
This specifies how to calculate the similarity type when returning similarities. This only changes the returned value, as cosine similarity is always used.
Functions
analogy(ref, word1, word2, word3, search_options \\ [])
@spec analogy(reference(), String.t(), String.t(), String.t(), search_options()) :: {:ok, [{String.t(), float()}]}
Returns the calculated analogy.
This method returns words that are close in vector space for the analogy query word1 is to word2 as word3 is to ?. More concretely, it searches embeddings that are similar to:
analogy_masked(ref, word_1, hide_1, word_2, hide_2, word_3, hide_3, search_options \\ [])
@spec analogy_masked( reference(), String.t(), bool(), String.t(), bool(), String.t(), bool(), search_options() ) :: {:ok, [{String.t(), float()}]}
This function is similar to the analogy query, but it also allows for the removal of queried words from the results.
dims(ref)
embedding(ref, word)
Returns the embedding of a word.
embedding_batch(ref, list_of_words)
Returns a vector of embeddings for a word.
embedding_similarity(ref, query, search_params \\ [])
returns words that are similar to the query vector.
idx(ref, word)
Returns the index of a word.
len(ref)
mean_embedding_batch(ref, words)
Returns the average embedding for the provided word and the fraction of how many words were included in the calculation.
metadata(ref)
Returns the metadata as a map or nil.
read(path, model_type)
Functions Available on the Embeddings Module
- :floret_text,
- :embeddings,
- :mmap_embeddings,
- :fasttext,
- :fasttext_lossy,
- :text,
- :text_lossy,
- :text_dims,
- :text_dims_lossy,
- :word2vec_binary,
- :word2vec_binary_lossy,
Aliases
- :fifu = :embeddings,
- :word2vec = :word2vec_binary,
- :floret = :floret_text
vocab_len(ref)
word_similarity(ref, word, search_params \\ [])
@spec word_similarity(reference(), String.t(), search_options()) :: {:ok, [{String.t(), float()}]}
Returns words that are similar to the query word.
words(ref)
Returns a list of words included in the embeddings.