gollum v0.3.3 Gollum.Cache

Caches the robots.txt files from different hosts in memory.

Add this module to your supervision tree. Use this module to perform fetches of the robots.txt and automatic caching of results. It also makes sure the two identical requests don't happen at the same time.

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor

Fetches the robots.txt from a host and stores it in the cache.
It will only perform the HTTP request if there isn't any current data in the cache, the data is too old (specified in the refresh_secs option in start_link/2) or when the force flag is set. This function is useful if you know which hosts you need to request beforehand

Gets the Gollum.Host struct for the specified host from the cache

Invoked when the server is started. start_link/3 or start/3 will block until it returns

Starts up the cache

Link to this section Functions

Link to this function

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

fetch(host, opts \\ [])
fetch(binary(), keyword()) :: :ok | {:error, term()}

Fetches the robots.txt from a host and stores it in the cache.
It will only perform the HTTP request if there isn't any current data in the cache, the data is too old (specified in the refresh_secs option in start_link/2) or when the force flag is set. This function is useful if you know which hosts you need to request beforehand.

Options

  • name - The name of the GenServer. Default value is Gollum.Cache.

  • async - Whether this call is async. If the call is async, :ok is always returned. The default value is false.

  • force - If the cache has already fetched from the host, this flag determines whether it should force a refresh. Default is false.

Link to this function

get(host, opts \\ [])
get(binary(), keyword()) :: Gollum.Host.t() | nil

Gets the Gollum.Host struct for the specified host from the cache.

Options

  • name - The name of the GenServer. Default value is Gollum.Cache.

Invoked when the server is started. start_link/3 or start/3 will block until it returns.

init_arg is the argument term (second argument) passed to start_link/3.

Returning {:ok, state} will cause start_link/3 to return {:ok, pid} and the process to enter its loop.

Returning {:ok, state, timeout} is similar to {:ok, state}, except that it also sets a timeout. See the "Timeouts" section in the module documentation for more information.

Returning {:ok, state, :hibernate} is similar to {:ok, state} except the process is hibernated before entering the loop. See c:handle_call/3 for more information on hibernation.

Returning {:ok, state, {:continue, continue}} is similar to {:ok, state} except that immediately after entering the loop the c:handle_continue/2 callback will be invoked with the value continue as first argument.

Returning :ignore will cause start_link/3 to return :ignore and the process will exit normally without entering the loop or calling c:terminate/2. If used when part of a supervision tree the parent supervisor will not fail to start nor immediately try to restart the GenServer. The remainder of the supervision tree will be started and so the GenServer should not be required by other processes. It can be started later with Supervisor.restart_child/2 as the child specification is saved in the parent supervisor. The main use cases for this are:

  • The GenServer is disabled by configuration but might be enabled later.
  • An error occurred and it will be handled by a different mechanism than the Supervisor. Likely this approach involves calling Supervisor.restart_child/2 after a delay to attempt a restart.

Returning {:stop, reason} will cause start_link/3 to return {:error, reason} and the process to exit with reason reason without entering the loop or calling c:terminate/2.

Callback implementation for GenServer.init/1.

Link to this function

start_link(opts \\ [])
start_link(keyword()) :: {:ok, pid()} | {:error, term()}

Starts up the cache.

Options

  • name - The name of the GenServer. Default value is Gollum.Cache.

  • refresh_secs - The number of seconds until the robots.txt will be refetched from the host. Defaults to 86_400, which is 1 day.

  • lazy_refresh - If this flag is set to true, the file will only be refetched from the host if needed. Otherwise, the file will be refreshed at the interval specified by refresh_secs. Defaults to false.

  • user_agent - The user agent to use when performing the GET request. Default is "Gollum".