View Source Metrics Reporting
Note: This page is being updated for OSS release. Please be patient. Current contents describes how we are handling our antikythera instance's and gears' metrics data in ACCESS.
- In antikythera core and gears, statistical metrics can be generated.
- Generated metrics are buffered, aggregated, then flushed to stable storage every minute.
- By default, antikythera uses Elasticsearch as its metrics storage, utilizing its search/aggregation feature and visualization with Kibana.
- All metrics are stored in
metrics-YYYY.MM.DD
indices underantikythera
subspace.h - All metrics documents come with following predefined fields:
@timestamp
node_id
otp_app_name
antikythera
or gear name
epool_id
gear-<gear_name>
ortenant-<tenant_id>
- Exists only in executor pool related metrics
- Metrics are stored for 1 month. Older indices will be automatically deleted.
Auto-collected metrics
- The following metrics are automatically gathered by antikythera:
- Metrics about web/g2g requests
- Number of processed requests (
web/g2g_request_count_*
) - Time distribution of response times (
web/g2g_response_time_ms_*
) - Executor pool checkout failure count (
web_timeout_in_epool_checkout_sum
) - Number of working processes for web request handling in an executor pool (
epool_working_action_runner_*
)
- Number of processed requests (
- Metrics about websocket interactions
- Number of active connections (
epool_websocket_connections_*
) - Number of rejected attempts to establish websocket connections (
epool_websocket_rejected_count
) - Number of received/sent websocket frames (
websocket_frames_received/sent
)
- Number of active connections (
- Metrics about async job
- Number of completed/failed jobs (
async_job_success/failure_sum
) - Time distribution of async job executions (
async_job_execution_time_ms_*
) - Number of waiting jobs in job queue (
epool_waiting_job_count
) - Number of runnable jobs in job queue (
epool_runnable_job_count
) - Number of running jobs in job queue (
epool_running_job_count
) - Number of waiting job brokers (
epool_waiting_broker_count
) - Number of working processes for async job in an executor pool (
epool_working_job_runner_*
)
- Number of completed/failed jobs (
- Metrics about web/g2g requests
Custom metrics
- You can generate and collect arbitrary numeric metrics from your gear's code.
- Field names for such metrics are prefixed with
custom_
and can be searched/visualized just like auto-generated metrics. - Use
YourGear.MetricsUploader.submit/2
from anywhere in your code to generate metrics.YourGear.MetricsUploader
process will then buffer and aggregate them before uploading.- First argument is metrics data list (
Antikythera.Metrics.DataList.t
).strategy
for each data must be an atom representation of one of aggregation strategies. Currently available strategies are::average
:sum
:time_distribution
- Generates average, max and 95-percentile values
:gauge
- Takes last value of each time window
- Second argument is the context (
Antikythera.Context.t
) from which the currently used executor pool ID will be extracted.
- First argument is metrics data list (
Metrics search and visualization
- Simply search
metrics-*
indices with filtering onotp_app_name
field using your gear name. - Note that all metrics are aggregated in uploader process per node, then uploaded to Elasticsearch.
There you will aggregate those metrics across nodes (mostly via Kibana query).
- This two-phase aggregation provides the entire view of metrics from applications running in multiple nodes, while reducing volume of network traffic.
- However, total metrics could be inaccurate, even with a coherent strategy is used. e.g. A sum of per-node sums is an accurate total sum, but an average of per-node averages is not an accurate total average.