gollum v0.3.3 Gollum.Cache
Caches the robots.txt files from different hosts in memory.
Add this module to your supervision tree. Use this module to perform fetches of the robots.txt and automatic caching of results. It also makes sure the two identical requests don't happen at the same time.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor
Fetches the robots.txt from a host and stores it in the cache.
It will only perform the HTTP request if there isn't any current data in the cache, the
data is too old (specified in the refresh_secs
option in start_link/2
) or when the
force
flag is set. This function is useful if you know which hosts you need to request
beforehand
Gets the Gollum.Host
struct for the specified host from the cache
Invoked when the server is started. start_link/3
or start/3
will
block until it returns
Starts up the cache
Link to this section Functions
child_spec(init_arg)
Returns a specification to start this module under a supervisor.
See Supervisor
.
fetch(host, opts \\ [])
Fetches the robots.txt from a host and stores it in the cache.
It will only perform the HTTP request if there isn't any current data in the cache, the
data is too old (specified in the refresh_secs
option in start_link/2
) or when the
force
flag is set. This function is useful if you know which hosts you need to request
beforehand.
Options
name
- The name of the GenServer. Default value isGollum.Cache
.async
- Whether this call is async. If the call is async,:ok
is always returned. The default value isfalse
.force
- If the cache has already fetched from the host, this flag determines whether it should force a refresh. Default isfalse
.
get(host, opts \\ [])
get(binary(), keyword()) :: Gollum.Host.t() | nil
get(binary(), keyword()) :: Gollum.Host.t() | nil
Gets the Gollum.Host
struct for the specified host from the cache.
Options
name
- The name of the GenServer. Default value isGollum.Cache
.
init(init_arg)
Invoked when the server is started. start_link/3
or start/3
will
block until it returns.
init_arg
is the argument term (second argument) passed to start_link/3
.
Returning {:ok, state}
will cause start_link/3
to return
{:ok, pid}
and the process to enter its loop.
Returning {:ok, state, timeout}
is similar to {:ok, state}
,
except that it also sets a timeout. See the "Timeouts" section
in the module documentation for more information.
Returning {:ok, state, :hibernate}
is similar to {:ok, state}
except the process is hibernated before entering the loop. See
c:handle_call/3
for more information on hibernation.
Returning {:ok, state, {:continue, continue}}
is similar to
{:ok, state}
except that immediately after entering the loop
the c:handle_continue/2
callback will be invoked with the value
continue
as first argument.
Returning :ignore
will cause start_link/3
to return :ignore
and
the process will exit normally without entering the loop or calling
c:terminate/2
. If used when part of a supervision tree the parent
supervisor will not fail to start nor immediately try to restart the
GenServer
. The remainder of the supervision tree will be started
and so the GenServer
should not be required by other processes.
It can be started later with Supervisor.restart_child/2
as the child
specification is saved in the parent supervisor. The main use cases for
this are:
- The
GenServer
is disabled by configuration but might be enabled later. - An error occurred and it will be handled by a different mechanism than the
Supervisor
. Likely this approach involves callingSupervisor.restart_child/2
after a delay to attempt a restart.
Returning {:stop, reason}
will cause start_link/3
to return
{:error, reason}
and the process to exit with reason reason
without
entering the loop or calling c:terminate/2
.
Callback implementation for GenServer.init/1
.
start_link(opts \\ [])
Starts up the cache.
Options
name
- The name of the GenServer. Default value isGollum.Cache
.refresh_secs
- The number of seconds until the robots.txt will be refetched from the host. Defaults to86_400
, which is 1 day.lazy_refresh
- If this flag is set totrue
, the file will only be refetched from the host if needed. Otherwise, the file will be refreshed at the interval specified byrefresh_secs
. Defaults tofalse
.user_agent
- The user agent to use when performing the GET request. Default is"Gollum"
.