content_indexer v0.2.5 ContentIndexer.TfIdf.Calculate

## Summary

calculates the content_indexer weights for a document of tokens against a corpus of tokenized documents

https://en.wikipedia.org/wiki/Tf-idf

** What is Tf-Idf **

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is
intended to reflect how important a word is to a document in a collection or corpus. It is often
used as a weighting factor in information retrieval and text mining.

See `ContentIndexer.TfIdf.IndexProcessTest` on to run this against a folder of documents

Link to this section Summary

Functions

Retrieves the current set of weights i.e. the state

Link to this section Functions

Link to this function tf_idf(document_name, tokens)

Retrieves the current set of weights i.e. the state

## Parameters

- document_name: String - document name
- tokens: List of tokens each being a String

## Example

iex> ContentIndexer.TfIdf.Calculate.tf_idf("test_file.md", ["bread","butter"])
{:ok, [
    {"test_file_1.md", [{"butter", 0}, {"bread", -0.234}]},
  ]
}