Crawly.Pipelines.DuplicatesFilter (Crawly v0.17.0) View Source

Filters out duplicated items based on the provided item_id.

Stores identifier values in state under the :duplicates_filter key.

Options

If item unique identifier is not provided, this pipeline does nothing.

  • :item_id, required: Designates a field to be used to check for duplicates. Falls back to global config :item_id.

Example Usage

  iex> item = %{my: "item"}
  iex> {_unchanged, new_state} = DuplicatesFilter.run(first, %{}, item_id: :my)

  # Rerunning the item through the pipeline will drop the item
  iex> DuplicatesFilter.run(first, %{}, item_id: :id)
  {false, %{
    duplicates_filter: %{"item" => true}
  }}