View Source cets_discovery behaviour (cets v0.2.0)

Node discovery logic.

Joins table together when a new node appears.

Things that make discovery logic harder:

- A table list is dynamic (but eventually we add all the tables into it).

- Creating Erlang distribution connection is async, but it net_kernel:ping/1 is blocking.

- net_kernel:ping/1 could block for unknown number of seconds (but net_kernel default timeout is 7 seconds).

- Resolving nodename could take a lot of time (5 seconds in tests). It is unpredictable blocking.

- join tables should be one by one to avoid OOM.

- Backend:get_nodes/1 could take a long time.

- cets_discovery:get_tables/1, cets_discovery:add_table/2 should be fast.

- The most important net_kernel flags for us to consider are:

* dist_auto_connect=never

* connect_all

* prevent_overlapping_partitions

These flags change the way the discovery logic behaves. Also the module would not try to connect to the hidden nodes.

Retry logic considerations:

- Backend:get_nodes/1 could return an error during startup, so we have to retry fast.

- There are two periods of operation for this module:

* startup phase, usually first 5 minutes.

* regular operation phase, after the startup phase.

- We don't need to check for the updated get_nodes too often in the regular operation phase.

Link to this section Summary

Types

Backend state.
gen_server's caller.
Result of get_nodes/2 call.
Join result information.
Number of milliseconds.
Backend could define its own options.
Retry logic type.
Discovery server process.
Result of start_link/1.
Discovery status.

Functions

Adds a table to be tracked and joined.
Deletes a table from being tracked or joined.
Gets a list of the tracked tables.
Gets information for each tracked table.
Starts a discovery process.
Starts a discovery process with a link.
Gets discovery process status.

Waits for the current get_nodes call to return.

Blocks until the initial discovery is done.

Link to this section Types

-type backend_state() :: term().
Backend state.
-type from() :: {pid(), reference()}.
gen_server's caller.
-type get_nodes_result() :: {ok, [node()]} | {error, term()}.
Result of get_nodes/2 call.
-type join_result() ::
    #{node := node(),
      table := atom(),
      what := join_result | pid_not_found,
      result => ok | {error, _},
      reason => term()}.
Join result information.
-type milliseconds() :: integer().
Number of milliseconds.
-type opts() :: #{name := atom(), _ := _}.
Backend could define its own options.
-type retry_type() :: initial | after_error | regular | after_nodedown.
Retry logic type.
-type server() :: pid() | atom().
Discovery server process.
-type start_result() :: {ok, pid()} | {error, term()}.
Result of start_link/1.
-type state() ::
    #{phase := initial | regular,
      results := [join_result()],
      nodes := ordsets:ordset(node()),
      unavailable_nodes := ordsets:ordset(node()),
      tables := [atom()],
      backend_module := module(),
      backend_state := state(),
      get_nodes_status := not_running | running,
      should_retry_get_nodes := boolean(),
      last_get_nodes_result := not_called_yet | get_nodes_result(),
      last_get_nodes_retry_type := retry_type(),
      join_status := not_running | running,
      should_retry_join := boolean(),
      timer_ref := reference() | undefined,
      pending_wait_for_ready := [gen_server:from()],
      pending_wait_for_get_nodes := [gen_server:from()],
      nodeup_timestamps := #{node() => milliseconds()},
      nodedown_timestamps := #{node() => milliseconds()},
      node_start_timestamps := #{node() => milliseconds()},
      start_time := milliseconds()}.
-type system_info() :: map().
Discovery status.

Link to this section Callbacks

Link to this section Functions

Link to this function

add_table(Server, Table)

View Source
-spec add_table(server(), cets:table_name()) -> ok.
Adds a table to be tracked and joined.
Link to this function

delete_table(Server, Table)

View Source
-spec delete_table(server(), cets:table_name()) -> ok.
Deletes a table from being tracked or joined.
-spec get_tables(server()) -> {ok, [cets:table_name()]}.
Gets a list of the tracked tables.
-spec info(server()) -> [cets:info()].
Gets information for each tracked table.
-spec start(opts()) -> start_result().
Starts a discovery process.
-spec start_link(opts()) -> start_result().
Starts a discovery process with a link.
-spec system_info(server()) -> system_info().
Gets discovery process status.
Link to this function

wait_for_get_nodes(Server, Timeout)

View Source
-spec wait_for_get_nodes(server(), timeout()) -> ok.

Waits for the current get_nodes call to return.

Just returns if there is no gen_nodes call running. Waits for another get_nodes, if should_retry_get_nodes flag is set. It is different from wait_for_ready, because it does not wait for unavailable nodes to return pang.
Link to this function

wait_for_ready(Server, Timeout)

View Source
-spec wait_for_ready(server(), timeout()) -> ok.

Blocks until the initial discovery is done.

This call would also wait till the data is loaded from the remote nodes.