View Source cets_discovery behaviour (cets v0.3.0)

Node discovery logic.

Joins table together when a new node appears.

Things that make discovery logic harder:

- A table list is dynamic (but eventually we add all the tables into it).

- Creating Erlang distribution connection is async, but it net_kernel:ping/1 is blocking.

- net_kernel:ping/1 could block for unknown number of seconds (but net_kernel default timeout is 7 seconds).

- Resolving nodename could take a lot of time (5 seconds in tests). It is unpredictable blocking.

- join tables should be one by one to avoid OOM.

- Backend:get_nodes/1 could take a long time.

- cets_discovery:get_tables/1, cets_discovery:add_table/2 should be fast.

- The most important net_kernel flags for us to consider are:

* dist_auto_connect=never

* connect_all

* prevent_overlapping_partitions

These flags change the way the discovery logic behaves. Also the module would not try to connect to the hidden nodes.

Retry logic considerations:

- Backend:get_nodes/1 could return an error during startup, so we have to retry fast.

- There are two periods of operation for this module:

* startup phase, usually first 5 minutes.

* regular operation phase, after the startup phase.

- We don't need to check for the updated get_nodes too often in the regular operation phase.

Summary

Types

Backend state.

gen_server's caller.

Result of get_nodes/2 call.

Join result information.

Number of milliseconds.

Backend could define its own options.

Retry logic type.

Discovery server process.

Discovery status.

Functions

Adds a table to be tracked and joined.

Deletes a table from being tracked or joined.

Gets a list of the tracked tables.

Gets information for each tracked table.

Starts a discovery process.

Starts a discovery process with a link.

Gets discovery process status.

Waits for the current get_nodes call to return.

Blocks until the initial discovery is done.

Types

-type backend_state() :: term().

Backend state.

-type from() :: {pid(), reference()}.

gen_server's caller.

-type get_nodes_result() :: {ok, [node()]} | {error, term()}.

Result of get_nodes/2 call.

-type join_result() ::
          #{node := node(),
            table := atom(),
            what := join_result | pid_not_found,
            result => ok | {error, _},
            reason => term()}.

Join result information.

-type milliseconds() :: integer().

Number of milliseconds.

-type opts() :: #{name := atom(), _ := _}.

Backend could define its own options.

-type retry_type() :: initial | after_error | regular | after_nodedown.

Retry logic type.

-type server() :: pid() | atom().

Discovery server process.

-type start_result() :: {ok, pid()} | {error, term()}.

Result of start_link/1.

-type state() ::
          #{phase := initial | regular,
            results := [join_result()],
            nodes := ordsets:ordset(node()),
            unavailable_nodes := ordsets:ordset(node()),
            tables := [atom()],
            backend_module := module(),
            backend_state := state(),
            get_nodes_status := not_running | running,
            should_retry_get_nodes := boolean(),
            last_get_nodes_result := not_called_yet | get_nodes_result(),
            last_get_nodes_retry_type := retry_type(),
            join_status := not_running | running,
            should_retry_join := boolean(),
            timer_ref := reference() | undefined,
            pending_wait_for_ready := [gen_server:from()],
            pending_wait_for_get_nodes := [gen_server:from()],
            nodeup_timestamps := #{node() => milliseconds()},
            nodedown_timestamps := #{node() => milliseconds()},
            node_start_timestamps := #{node() => milliseconds()},
            start_time := milliseconds()}.
-type system_info() :: map().

Discovery status.

Callbacks

Functions

Link to this function

add_table(Server, Table)

View Source
-spec add_table(server(), cets:table_name()) -> ok.

Adds a table to be tracked and joined.

Link to this function

delete_table(Server, Table)

View Source
-spec delete_table(server(), cets:table_name()) -> ok.

Deletes a table from being tracked or joined.

-spec get_tables(server()) -> {ok, [cets:table_name()]}.

Gets a list of the tracked tables.

-spec info(server()) -> [cets:info()].

Gets information for each tracked table.

-spec start(opts()) -> start_result().

Starts a discovery process.

-spec start_link(opts()) -> start_result().

Starts a discovery process with a link.

-spec system_info(server()) -> system_info().

Gets discovery process status.

Link to this function

wait_for_get_nodes(Server, Timeout)

View Source
-spec wait_for_get_nodes(server(), timeout()) -> ok.

Waits for the current get_nodes call to return.

Just returns if there is no gen_nodes call running. Waits for another get_nodes, if should_retry_get_nodes flag is set. It is different from wait_for_ready, because it does not wait for unavailable nodes to return pang.

Link to this function

wait_for_ready(Server, Timeout)

View Source
-spec wait_for_ready(server(), timeout()) -> ok.

Blocks until the initial discovery is done.

This call would also wait till the data is loaded from the remote nodes.