Crawly.Models.Job (Crawly v0.17.2) View Source
The Crawly.Models.Job
module defines a struct and functions for managing and updating information about a web scraping job.
Struct
The Crawly.Models.Job
struct has the following fields:
id
: a binary representing the unique ID of the job.spider_name
: a binary representing the name of the spider used for the job.start
: a DateTime.t representing the time the job started.end
: a DateTime.t representing the time the job ended. This isnil
until the job is completed.scraped_items
: an integer representing the number of items scraped during the job.stop_reason
: a binary representing the reason the job was stopped, if applicable. This isnil
until the job is stopped.
Functions
new(crawl_id, spider_name)
: creates a new job with the givencrawl_id
andspider_name
and stores it in a SimpleStorage instance.update(crawl_id, total_scraped, stop_reason)
: updates the job with the givencrawl_id
with the total number of itemstotal_scraped
and the reasonstop_reason
, and stores it in the SimpleStorage instance.get(crawl_id)
: retrieves the job with the givencrawl_id
from the SimpleStorage instance.
Link to this section Summary
Functions
Retrieves the job with the given crawl_id
from the SimpleStorage instance.
List all registered jobs
Creates a new job with the given crawl_id
and spider_name
, and stores it in the SimpleStorage instance.
Updates the job with the given crawl_id
in the SimpleStorage instance with the provided total_scraped
and stop_reason
.
Link to this section Types
Specs
t() :: %Crawly.Models.Job{ end: DataTime.t(), id: binary(), scraped_items: integer(), spider_name: binary(), start: DateTime.t(), stop_reason: binary() }
Link to this section Functions
Specs
Retrieves the job with the given crawl_id
from the SimpleStorage instance.
## Examples
iex> Crawly.Models.Job.get("my-crawl-id")
{:ok, %Crawly.Models.Job{
id: "my-crawl-id",
spider_name: "my-spider",
start: ~U[2023-03-31 16:00:00Z],
end: nil,
scraped_items: 0,
stop_reason: nil
}}
## Parameters
crawl_id
: a binary representing the unique ID of the job.
## Returns
{:ok, job}
: a tuple containing the retrieved job as aCrawly.Models.Job
struct if it exists in the SimpleStorage instance.{:error, reason}
: a tuple containing the reason why the job could not be retrieved, such as if it does not exist in the SimpleStorage instance.
Specs
list() :: [term()]
List all registered jobs
Specs
Creates a new job with the given crawl_id
and spider_name
, and stores it in the SimpleStorage instance.
## Examples
iex> Crawly.Models.Job.new("my-crawl-id", "my-spider")
:ok
## Parameters
crawl_id
: a binary representing the unique ID of the job.spider_name
: a binary representing the name of the spider used for the job.
## Returns
:ok
: if the job was created and stored successfully.{:error, reason}
: if an error occurred while trying to create or store the job.
Specs
Updates the job with the given crawl_id
in the SimpleStorage instance with the provided total_scraped
and stop_reason
.
## Examples
iex> Crawly.Models.Job.update("my-crawl-id", 100, :finished)
:ok
## Parameters
crawl_id
: a binary representing the unique ID of the job.total_scraped
: an integer representing the total number of items scraped during the job.stop_reason
: a term representing the reason why the job was stopped, if applicable.
## Returns
:ok
: if the job was successfully updated in the SimpleStorage instance.{:error, reason}
: a tuple containing the reason why the job could not be updated, such as if it does not exist in the SimpleStorage instance.