content_indexer v0.2.5 ContentIndexer.TfIdf.Calculate
## Summary
calculates the content_indexer weights for a document of tokens against a corpus of tokenized documents
https://en.wikipedia.org/wiki/Tf-idf
** What is Tf-Idf **
tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is
intended to reflect how important a word is to a document in a collection or corpus. It is often
used as a weighting factor in information retrieval and text mining.
See `ContentIndexer.TfIdf.IndexProcessTest` on to run this against a folder of documents
Link to this section Summary
Functions
Retrieves the current set of weights i.e. the state
Link to this section Functions
Link to this function
tf_idf(document_name, tokens)
Retrieves the current set of weights i.e. the state
## Parameters
- document_name: String - document name
- tokens: List of tokens each being a String
## Example
iex> ContentIndexer.TfIdf.Calculate.tf_idf("test_file.md", ["bread","butter"])
{:ok, [
{"test_file_1.md", [{"butter", 0}, {"bread", -0.234}]},
]
}