Tinkex includes a lightweight metrics system for tracking request performance, custom counters, gauges, and histograms. The Tinkex.Metrics server automatically collects HTTP request telemetry and provides helpers for recording custom metrics in experiments and benchmarks.

Overview

The Metrics system is built on GenServer and Telemetry, providing:

  • Automatic HTTP request tracking: counters for success/failure and latency histograms
  • Custom counters: increment-based metrics for tracking events
  • Gauges: point-in-time measurements that can be set directly
  • Histograms: distribution tracking with percentile calculations (p50, p95, p99)
  • Zero-overhead when disabled: metrics can be toggled off via configuration
  • Thread-safe: all updates via GenServer casts/calls

The server starts automatically with the Tinkex application and subscribes to [:tinkex, :http, :request, :stop] telemetry events.

Built-in HTTP metrics

When enabled, Tinkex automatically tracks:

Request counters

  • :tinkex_requests_total — total number of HTTP requests
  • :tinkex_requests_success — requests that returned :ok
  • :tinkex_requests_failure — requests that returned an error

Request latency histogram

  • :tinkex_request_duration_ms — end-to-end request duration in milliseconds

This histogram includes:

  • Count: total number of requests
  • Mean: average latency
  • Min/Max: fastest and slowest requests
  • Percentiles: p50 (median), p95, p99

Custom counters

Use Metrics.increment/2 to count events in your application:

# Increment by 1 (default)
Tinkex.Metrics.increment(:my_custom_counter)

# Increment by a specific amount
Tinkex.Metrics.increment(:tokens_generated, 150)
Tinkex.Metrics.increment(:cache_hits, 1)
Tinkex.Metrics.increment(:errors, 1)

Common use cases:

  • Track cache hits/misses
  • Count successful vs failed generations
  • Track tokens consumed across multiple requests
  • Count specific error types

Gauges

Gauges represent instantaneous values that can go up or down. Use Metrics.set_gauge/2 to record the current state:

# Track queue depth
Tinkex.Metrics.set_gauge(:queue_depth, 42)

# Track active connections
Tinkex.Metrics.set_gauge(:active_connections, 8)

# Track memory usage
{:ok, memory} = :erlang.memory(:total)
Tinkex.Metrics.set_gauge(:memory_bytes, memory)

# Track temperature parameter
Tinkex.Metrics.set_gauge(:current_temperature, 0.7)

Common use cases:

  • Monitor queue depths or buffer sizes
  • Track active connections or worker pools
  • Record configuration values during experiments
  • Monitor resource usage (memory, CPU)

Unlike counters, gauges are always set to a specific value rather than incremented.

Histograms

Histograms track distributions of values over time. Use Metrics.record_histogram/2 to record samples (values should be in milliseconds):

# Record a custom latency measurement
start = System.monotonic_time(:millisecond)
result = do_some_work()
duration_ms = System.monotonic_time(:millisecond) - start
Tinkex.Metrics.record_histogram(:custom_operation_duration, duration_ms)

# Track token generation time
Tinkex.Metrics.record_histogram(:token_generation_ms, 125.5)

# Track decode latency
Tinkex.Metrics.record_histogram(:decode_latency_ms, 3.2)

Histogram features:

  • Automatic bucket assignment based on configured latency buckets
  • Stores up to max_samples individual values for percentile calculation
  • Computes min, max, mean, p50, p95, p99
  • Memory-bounded (older samples dropped when limit reached)

Common use cases:

  • Track end-to-end operation latencies
  • Measure token generation speed
  • Monitor decode/encode times
  • Track database query performance

Getting snapshots

Call Metrics.snapshot/0 to retrieve current metrics state:

snapshot = Tinkex.Metrics.snapshot()

# Snapshot structure:
%{
  counters: %{
    tinkex_requests_total: 150,
    tinkex_requests_success: 145,
    tinkex_requests_failure: 5,
    my_custom_counter: 42
  },
  gauges: %{
    queue_depth: 8,
    active_connections: 4
  },
  histograms: %{
    tinkex_request_duration_ms: %{
      count: 150,
      mean: 245.3,
      min: 89.2,
      max: 1205.7,
      p50: 220.1,
      p95: 458.2,
      p99: 892.5
    }
  }
}

Access specific metrics:

snapshot = Tinkex.Metrics.snapshot()

# Check total requests
total = snapshot.counters[:tinkex_requests_total] || 0

# Check success rate
success = snapshot.counters[:tinkex_requests_success] || 0
failure = snapshot.counters[:tinkex_requests_failure] || 0
success_rate = if total > 0, do: success / total * 100, else: 0

# Check p99 latency
latency_hist = snapshot.histograms[:tinkex_request_duration_ms]
p99_latency = latency_hist.p99

Understanding latency percentiles

Percentiles tell you what percentage of requests completed faster than a given threshold:

  • p50 (median): 50% of requests were faster than this value
  • p95: 95% of requests were faster than this value
  • p99: 99% of requests were faster than this value

Example interpretation:

%{
  p50: 220.1,   # Half of all requests completed in under 220ms
  p95: 458.2,   # 95% completed in under 458ms
  p99: 892.5    # 99% completed in under 892ms
}

High p99 values indicate "tail latency" — a small percentage of requests taking much longer than average. This is critical for understanding worst-case user experience.

Configuration options

Configure metrics in config/config.exs:

config :tinkex,
  # Enable or disable metrics collection
  metrics_enabled: true,

  # Histogram bucket boundaries in milliseconds
  # Default: [1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000]
  metrics_latency_buckets: [10, 50, 100, 250, 500, 1_000, 2_500, 5_000],

  # Maximum individual samples to keep per histogram
  # Default: 1_000
  metrics_histogram_max_samples: 2_000

Configuration guide:

Latency buckets

Buckets define histogram boundaries. Choose values appropriate for your workload:

# For fast operations (sub-second)
metrics_latency_buckets: [1, 5, 10, 25, 50, 100, 250, 500]

# For slow operations (multi-second)
metrics_latency_buckets: [100, 500, 1_000, 2_000, 5_000, 10_000, 30_000]

# For mixed workloads (default)
metrics_latency_buckets: [1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000]

More buckets = finer granularity but more memory usage.

Max samples

The max_samples setting controls how many individual values are stored for percentile calculation:

# Lower memory usage, less accurate percentiles
metrics_histogram_max_samples: 500

# Higher accuracy, more memory
metrics_histogram_max_samples: 5_000

When the limit is reached, new samples displace older ones. For production workloads with high volume, consider a lower value (500-1000). For detailed analysis, use higher values (5000-10000).

Disabling metrics

To disable metrics entirely:

config :tinkex, metrics_enabled: false

Or pass at startup:

{:ok, _} = Tinkex.Metrics.start_link(enabled: false)

Integration with experiments

Use metrics to track experiment progress and performance:

defmodule MyExperiment do
  def run_benchmark(num_iterations) do
    # Reset metrics at start
    :ok = Tinkex.Metrics.reset()

    # Track experiment configuration
    Tinkex.Metrics.set_gauge(:experiment_iterations, num_iterations)
    Tinkex.Metrics.set_gauge(:experiment_temperature, 0.7)

    Enum.each(1..num_iterations, fn i ->
      start = System.monotonic_time(:millisecond)

      # Your experiment code
      {:ok, result} = run_single_trial(i)

      # Track custom metrics
      Tinkex.Metrics.increment(:trials_completed)
      if result.success?, do: Tinkex.Metrics.increment(:successful_trials)

      # Track trial duration
      duration = System.monotonic_time(:millisecond) - start
      Tinkex.Metrics.record_histogram(:trial_duration_ms, duration)

      # Track tokens generated
      Tinkex.Metrics.increment(:total_tokens, result.num_tokens)
    end)

    # Flush pending updates
    :ok = Tinkex.Metrics.flush()

    # Get final snapshot
    snapshot = Tinkex.Metrics.snapshot()

    # Compute experiment metrics
    total_trials = snapshot.counters[:trials_completed] || 0
    successful = snapshot.counters[:successful_trials] || 0
    success_rate = if total_trials > 0, do: successful / total_trials * 100, else: 0

    trial_stats = snapshot.histograms[:trial_duration_ms]

    IO.puts """
    Experiment complete:
      Trials: #{total_trials}
      Success rate: #{:erlang.float_to_binary(success_rate, decimals: 1)}%
      Trial duration:
        Mean: #{format_ms(trial_stats.mean)}
        p50:  #{format_ms(trial_stats.p50)}
        p95:  #{format_ms(trial_stats.p95)}
        p99:  #{format_ms(trial_stats.p99)}
      HTTP requests:
        Total: #{snapshot.counters[:tinkex_requests_total] || 0}
        Success: #{snapshot.counters[:tinkex_requests_success] || 0}
        Failure: #{snapshot.counters[:tinkex_requests_failure] || 0}
    """
  end

  defp format_ms(nil), do: "n/a"
  defp format_ms(value), do: "#{:erlang.float_to_binary(value, decimals: 2)}ms"
end

Integration with benchmarks

Track comparative performance across different configurations:

defmodule ModelComparison do
  def compare_models(models, prompt, num_runs) do
    results =
      Enum.map(models, fn model ->
        # Reset for each model
        :ok = Tinkex.Metrics.reset()

        Enum.each(1..num_runs, fn _ ->
          {:ok, _response} = sample_with_model(model, prompt)
        end)

        :ok = Tinkex.Metrics.flush()
        snapshot = Tinkex.Metrics.snapshot()

        latency = snapshot.histograms[:tinkex_request_duration_ms]

        {model, %{
          total_requests: snapshot.counters[:tinkex_requests_total] || 0,
          success_rate: calculate_success_rate(snapshot),
          mean_latency: latency.mean,
          p50_latency: latency.p50,
          p99_latency: latency.p99
        }}
      end)

    # Print comparison table
    print_comparison_table(results)
  end

  defp calculate_success_rate(snapshot) do
    total = snapshot.counters[:tinkex_requests_total] || 0
    success = snapshot.counters[:tinkex_requests_success] || 0
    if total > 0, do: success / total * 100, else: 0
  end
end

Utility functions

Reset metrics

Clear all counters, gauges, and histograms:

:ok = Tinkex.Metrics.reset()

Use this between experiments or benchmark runs to start fresh.

Flush pending updates

Block until all pending metric updates are processed:

:ok = Tinkex.Metrics.flush()

This ensures all async casts have been handled before reading a snapshot. Useful for deterministic testing and experiment finalization.

Example: end-to-end workflow

See examples/metrics_live.exs for a complete example:

# Reset metrics
:ok = Tinkex.Metrics.reset()

# Run some requests (metrics collected automatically)
{:ok, service} = Tinkex.ServiceClient.start_link(config: config)
{:ok, sampler} = Tinkex.ServiceClient.create_sampling_client(service, base_model: model)
{:ok, task} = Tinkex.SamplingClient.sample(sampler, prompt, params, num_samples: 5)
{:ok, _response} = Task.await(task, 30_000)

# Ensure all metrics are recorded
:ok = Tinkex.Metrics.flush()

# Get snapshot
snapshot = Tinkex.Metrics.snapshot()

# Print results
IO.puts "\n=== Metrics Snapshot ==="
IO.puts "Counters:"
Enum.each(snapshot.counters, fn {name, value} ->
  IO.puts "  #{name}: #{value}"
end)

IO.puts "\nLatency (ms):"
latency = snapshot.histograms[:tinkex_request_duration_ms]
IO.puts "  count: #{latency.count}"
IO.puts "  mean:  #{:erlang.float_to_binary(latency.mean, decimals: 2)}"
IO.puts "  p50:   #{:erlang.float_to_binary(latency.p50, decimals: 2)}"
IO.puts "  p95:   #{:erlang.float_to_binary(latency.p95, decimals: 2)}"
IO.puts "  p99:   #{:erlang.float_to_binary(latency.p99, decimals: 2)}"

Run the example:

TINKER_API_KEY=your-key mix run examples/metrics_live.exs

Best practices

  1. Reset between experiments: Call Metrics.reset/0 at the start of each independent run
  2. Flush before reading: Call Metrics.flush/0 before taking snapshots to ensure all updates are processed
  3. Choose appropriate buckets: Match latency buckets to your expected request durations
  4. Monitor p99: Don't just look at averages — p99 reveals tail latency issues
  5. Track custom metrics: Use counters and histograms to track domain-specific events
  6. Use gauges for configuration: Record experiment parameters as gauges for reproducibility
  7. Disable in production: If metrics aren't needed, disable to reduce overhead
  • Getting started with Tinkex: docs/guides/getting_started.md
  • Troubleshooting common issues: docs/guides/troubleshooting.md
  • Training loop integration: docs/guides/training_loop.md
  • API reference: docs/guides/api_reference.md