Runbox.Deduplicator (runbox v1.3.0)

Decide if a message is a duplicate by comparing it to messages already seen.

Only messages with the biggest timestamp are remembered as "seen". If the actual message has lower timestamp than the messages in "seen", it is considered "old". If it has the exact same timestamp, it is compared for equality with all the "seen" messages and if it matches, it is considered "duplicate". In any other case, the message is considered "new".

Link to this section Summary


Decides if a given message is a duplicate or not.

Initializes a deduplicator.

Link to this section Types

@opaque t()

Link to this section Functions

Link to this function

deduplicate(msg, state)

@spec deduplicate(any(), t()) :: {msg_condition, t()}
when msg_condition: :new | :old | :duplicate

Decides if a given message is a duplicate or not.

Returns a tuple with {msg_condition, deduplicator_state} where msg_condition can be one of:

  • :new - message is new and was not yet seen.

  • :old - message is older than the messages already seen.

  • :duplicate - message is equal to one of the latest messages already seen.

Link to this function

new(stream, extract_timestamp)

@spec new(Enumerable.t(), (any() -> non_neg_integer())) :: t() | no_return()

Initializes a deduplicator.

stream is an enumerable (possibly empty), which contains messages with descending timestamps - from newest to oldest. It is used to initialize "seen" messages from messages which were already processed.

extract_timestamp is a function which returns a timestamp for a given message.