Crawly.Pipelines.DuplicatesFilter (Crawly v0.17.2) View Source
Filters out duplicated items based on the provided item_id
.
Stores identifier values in state under the :duplicates_filter
key.
Options
If item unique identifier is not provided, this pipeline does nothing.
:item_id
, required: Designates a field to be used to check for duplicates. Falls back to global config:item_id
.
Example Usage
iex> item = %{my: "item"}
iex> {_unchanged, new_state} = DuplicatesFilter.run(first, %{}, item_id: :my)
# Rerunning the item through the pipeline will drop the item
iex> DuplicatesFilter.run(first, %{}, item_id: :id)
{false, %{
duplicates_filter: %{"item" => true}
}}