Crawly.Models.Job (Crawly v0.17.0) View Source

The Crawly.Models.Job module defines a struct and functions for managing and updating information about a web scraping job.

Struct

The Crawly.Models.Job struct has the following fields:

  • id: a binary representing the unique ID of the job.
  • spider_name: a binary representing the name of the spider used for the job.
  • start: a DateTime.t representing the time the job started.
  • end: a DateTime.t representing the time the job ended. This is nil until the job is completed.
  • scraped_items: an integer representing the number of items scraped during the job.
  • stop_reason: a binary representing the reason the job was stopped, if applicable. This is nil until the job is stopped.

Functions

  • new(crawl_id, spider_name): creates a new job with the given crawl_id and spider_name and stores it in a SimpleStorage instance.
  • update(crawl_id, total_scraped, stop_reason): updates the job with the given crawl_id with the total number of items total_scraped and the reason stop_reason, and stores it in the SimpleStorage instance.
  • get(crawl_id): retrieves the job with the given crawl_id from the SimpleStorage instance.

Link to this section Summary

Functions

Retrieves the job with the given crawl_id from the SimpleStorage instance.

List all registered jobs

Creates a new job with the given crawl_id and spider_name, and stores it in the SimpleStorage instance.

Updates the job with the given crawl_id in the SimpleStorage instance with the provided total_scraped and stop_reason.

Link to this section Types

Specs

t() :: %Crawly.Models.Job{
  end: DataTime.t(),
  id: binary(),
  scraped_items: integer(),
  spider_name: binary(),
  start: DateTime.t(),
  stop_reason: binary()
}

Link to this section Functions

Specs

get(term()) :: {:error, term()} | {:ok, t()}

Retrieves the job with the given crawl_id from the SimpleStorage instance.

## Examples

  iex> Crawly.Models.Job.get("my-crawl-id")
  {:ok, %Crawly.Models.Job{
    id: "my-crawl-id",
    spider_name: "my-spider",
    start: ~U[2023-03-31 16:00:00Z],
    end: nil,
    scraped_items: 0,
    stop_reason: nil
  }}

## Parameters

  • crawl_id: a binary representing the unique ID of the job.

## Returns

  • {:ok, job}: a tuple containing the retrieved job as a Crawly.Models.Job struct if it exists in the SimpleStorage instance.
  • {:error, reason}: a tuple containing the reason why the job could not be retrieved, such as if it does not exist in the SimpleStorage instance.

Specs

list() :: [term()]

List all registered jobs

Link to this function

new(crawl_id, spider_name)

View Source

Specs

new(term(), atom()) :: :ok | {:error, term()}

Creates a new job with the given crawl_id and spider_name, and stores it in the SimpleStorage instance.

## Examples

  iex> Crawly.Models.Job.new("my-crawl-id", "my-spider")
  :ok

## Parameters

  • crawl_id: a binary representing the unique ID of the job.
  • spider_name: a binary representing the name of the spider used for the job.

## Returns

  • :ok: if the job was created and stored successfully.
  • {:error, reason}: if an error occurred while trying to create or store the job.
Link to this function

update(crawl_id, total_scraped, stop_reason)

View Source

Specs

update(String.t(), integer(), term()) :: :ok | {:error, any()}

Updates the job with the given crawl_id in the SimpleStorage instance with the provided total_scraped and stop_reason.

## Examples

  iex> Crawly.Models.Job.update("my-crawl-id", 100, :finished)
  :ok

## Parameters

  • crawl_id: a binary representing the unique ID of the job.
  • total_scraped: an integer representing the total number of items scraped during the job.
  • stop_reason: a term representing the reason why the job was stopped, if applicable.

## Returns

  • :ok: if the job was successfully updated in the SimpleStorage instance.
  • {:error, reason}: a tuple containing the reason why the job could not be updated, such as if it does not exist in the SimpleStorage instance.