Crawly.Pipelines.DuplicatesFilter (Crawly v0.12.0) View Source
Filters out duplicated items based on the provided item_id.
Stores identifier values in state under the :duplicates_filter key.
Options
If item unique identifier is not provided, this pipeline does nothing.
- :item_id, required: Designates a field to be used to check for duplicates. Falls back to global config- :item_id.
Example Usage
  iex> item = %{my: "item"}
  iex> {_unchanged, new_state} = DuplicatesFilter.run(first, %{}, item_id: :my)
  # Rerunning the item through the pipeline will drop the item
  iex> DuplicatesFilter.run(first, %{}, item_id: :id)
  {false, %{
    duplicates_filter: %{"item" => true}
  }}