TantivyEx.Aggregation (TantivyEx v0.4.1)

View Source

Comprehensive aggregation functionality for TantivyEx with Elasticsearch-compatible API.

This module provides a complete aggregation system supporting:

  • Bucket aggregations (terms, histogram, date_histogram, range)
  • Metric aggregations (avg, min, max, sum, count, stats, percentiles)
  • Nested/sub-aggregations
  • Elasticsearch-compatible JSON request/response format
  • Advanced aggregation options and configurations

Features

Bucket Aggregations

  • Terms: Group documents by field values
  • Histogram: Group numeric values into buckets with fixed intervals
  • Date Histogram: Group date values into time-based buckets
  • Range: Group documents into custom value ranges

Metric Aggregations

  • Average: Calculate average value of a numeric field
  • Min/Max: Find minimum/maximum values
  • Sum: Calculate sum of numeric field values
  • Count: Count documents (value count aggregation)
  • Stats: Calculate min, max, sum, count, and average in one aggregation
  • Percentiles: Calculate percentile values (50th, 95th, 99th, etc.)

Advanced Features

  • Nested Aggregations: Add sub-aggregations to bucket aggregations
  • Memory Optimization: Built-in memory limits and performance optimizations
  • Elasticsearch Compatibility: Request/response format matches Elasticsearch
  • Error Handling: Comprehensive validation and error reporting

Usage Examples

# Simple terms aggregation
aggregations = %{
  "categories" => %{
    "terms" => %{
      "field" => "category",
      "size" => 10
    }
  }
}

{:ok, result} = Aggregation.run(searcher, query, aggregations)

# Histogram with sub-aggregation
aggregations = %{
  "price_histogram" => %{
    "histogram" => %{
      "field" => "price",
      "interval" => 10.0
    },
    "aggs" => %{
      "avg_rating" => %{
        "avg" => %{
          "field" => "rating"
        }
      }
    }
  }
}

# Date histogram
aggregations = %{
  "sales_over_time" => %{
    "date_histogram" => %{
      "field" => "timestamp",
      "calendar_interval" => "month"
    }
  }
}

# Range aggregation
aggregations = %{
  "price_ranges" => %{
    "range" => %{
      "field" => "price",
      "ranges" => [
        %{"to" => 50},
        %{"from" => 50, "to" => 100},
        %{"from" => 100}
      ]
    }
  }
}

# Multiple aggregations
aggregations = %{
  "avg_price" => %{
    "avg" => %{"field" => "price"}
  },
  "max_price" => %{
    "max" => %{"field" => "price"}
  },
  "price_stats" => %{
    "stats" => %{"field" => "price"}
  }
}

# Search with aggregations
{:ok, result} = Aggregation.search_with_aggregations(searcher, query, aggregations, 20)

Summary

Functions

Creates a complete aggregation request with multiple aggregations.

Creates a date histogram aggregation for grouping date values into time-based buckets.

Creates a histogram aggregation for grouping numeric values into fixed-interval buckets.

Creates a metric aggregation for calculating statistics on numeric fields.

Creates a range aggregation for grouping documents into custom value ranges.

Runs aggregations on search results without returning documents.

Runs a search query with aggregations, returning both hits and aggregation results.

Creates a terms aggregation for grouping documents by field values.

Adds sub-aggregations to a bucket aggregation.

Types

aggregation_options()

@type aggregation_options() :: [
  validate: boolean(),
  memory_limit: pos_integer(),
  timeout: pos_integer()
]

aggregation_request()

@type aggregation_request() :: map()

aggregation_result()

@type aggregation_result() :: map()

Functions

build_request(aggregations)

@spec build_request(map() | keyword()) :: map()

Creates a complete aggregation request with multiple aggregations.

Parameters

  • aggregations: Map or keyword list of aggregation definitions

Examples

aggs = Aggregation.build_request([
  {"categories", Aggregation.terms("category", size: 20)},
  {"avg_price", Aggregation.metric(:avg, "price")},
  {"price_histogram", Aggregation.histogram("price", 10.0)}
])

# Or with a map
aggs = Aggregation.build_request(%{
  "categories" => Aggregation.terms("category"),
  "stats" => Aggregation.metric(:stats, "price")
})

date_histogram(field, interval, options \\ [])

@spec date_histogram(String.t(), String.t(), keyword()) :: map()

Creates a date histogram aggregation for grouping date values into time-based buckets.

Parameters

  • field: Date field name to aggregate on
  • interval: Time interval (e.g., "day", "month", "year", "1h", "30m")
  • options: Date histogram aggregation options

Options

  • :min_doc_count - Minimum document count for buckets (default: 1)
  • :keyed - Return buckets as a map instead of array (default: false)
  • :time_zone - Time zone for bucket calculation
  • :format - Date format for bucket keys

Examples

date_hist = Aggregation.date_histogram("timestamp", "month")
# Returns: %{"date_histogram" => %{"field" => "timestamp", "calendar_interval" => "month"}}

hourly_hist = Aggregation.date_histogram("created_at", "1h", time_zone: "America/New_York")

histogram(field, interval, options \\ [])

@spec histogram(String.t(), float(), keyword()) :: map()

Creates a histogram aggregation for grouping numeric values into fixed-interval buckets.

Parameters

  • field: Numeric field name to aggregate on
  • interval: Bucket interval size
  • options: Histogram aggregation options

Options

  • :min_doc_count - Minimum document count for buckets (default: 1)
  • :keyed - Return buckets as a map instead of array (default: false)

Examples

hist_agg = Aggregation.histogram("price", 10.0, min_doc_count: 2)
# Returns: %{"histogram" => %{"field" => "price", "interval" => 10.0, "min_doc_count" => 2}}

metric(type, field, options \\ [])

@spec metric(atom(), String.t(), keyword()) :: map()

Creates a metric aggregation for calculating statistics on numeric fields.

Parameters

  • type: Type of metric (:avg, :min, :max, :sum, :count, :stats, :percentiles)
  • field: Field name to calculate metrics on
  • options: Metric-specific options

Metric Types

  • :avg - Average value
  • :min - Minimum value
  • :max - Maximum value
  • :sum - Sum of all values
  • :count - Count of values
  • :stats - All basic statistics (min, max, avg, sum, count)
  • :percentiles - Percentile calculations

Options for :percentiles

  • :percents - List of percentiles to calculate (default: [1, 5, 25, 50, 75, 95, 99])
  • :keyed - Return as map instead of array (default: true)

Examples

avg_agg = Aggregation.metric(:avg, "price")
# Returns: %{"avg" => %{"field" => "price"}}

stats_agg = Aggregation.metric(:stats, "rating")
# Returns: %{"stats" => %{"field" => "rating"}}

percentiles_agg = Aggregation.metric(:percentiles, "response_time", percents: [50, 95, 99])

range(field, ranges, options \\ [])

@spec range(String.t(), [map()] | [tuple()], keyword()) :: map()

Creates a range aggregation for grouping documents into custom value ranges.

Parameters

  • field: Numeric field name to aggregate on
  • ranges: List of range specifications
  • options: Range aggregation options

Range Specifications

Each range can have:

  • :from - Lower bound (inclusive)
  • :to - Upper bound (exclusive)
  • :key - Custom key name for the bucket

Options

  • :keyed - Return buckets as a map instead of array (default: false)

Examples

ranges = [
  %{"to" => 50},
  %{"from" => 50, "to" => 100, "key" => "medium"},
  %{"from" => 100}
]
range_agg = Aggregation.range("price", ranges)

# Using helper
range_agg = Aggregation.range("price", [
  {nil, 50},
  {50, 100, "medium"},
  {100, nil}
])

run(searcher, query, aggregations, options \\ [])

@spec run(term(), term(), aggregation_request(), aggregation_options()) ::
  {:ok, aggregation_result()} | {:error, String.t()}

Runs aggregations on search results without returning documents.

Parameters

  • searcher: SearcherResource from TantivyEx.Searcher
  • query: QueryResource from TantivyEx.Query
  • aggregations: Map of aggregation definitions
  • options: Aggregation options (optional)

Returns

  • {:ok, aggregation_results} on success
  • {:error, reason} on failure

Examples

aggregations = %{
  "categories" => %{
    "terms" => %{
      "field" => "category",
      "size" => 10
    }
  }
}

{:ok, results} = Aggregation.run(searcher, query, aggregations)

# Results format:
%{
  "categories" => %{
    "doc_count_error_upper_bound" => 0,
    "sum_other_doc_count" => 0,
    "buckets" => [
      %{"key" => "electronics", "doc_count" => 150},
      %{"key" => "books", "doc_count" => 89}
    ]
  }
}

search_with_aggregations(searcher, query, aggregations, search_limit \\ 10, options \\ [])

@spec search_with_aggregations(
  term(),
  term(),
  aggregation_request(),
  non_neg_integer(),
  aggregation_options()
) :: {:ok, map()} | {:error, String.t()}

Runs a search query with aggregations, returning both hits and aggregation results.

Parameters

  • searcher: SearcherResource from TantivyEx.Searcher
  • query: QueryResource from TantivyEx.Query
  • aggregations: Map of aggregation definitions
  • search_limit: Maximum number of documents to return (default: 10)
  • options: Aggregation options (optional)

Returns

  • {:ok, %{hits: search_results, aggregations: aggregation_results}} on success
  • {:error, reason} on failure

Examples

aggregations = %{
  "avg_price" => %{
    "avg" => %{"field" => "price"}
  }
}

{:ok, result} = Aggregation.search_with_aggregations(searcher, query, aggregations, 20)

# Result format:
%{
  "hits" => %{
    "total" => 150,
    "hits" => [
      %{"score" => 1.5, "doc_id" => 1, "title" => "Product 1", ...},
      ...
    ]
  },
  "aggregations" => %{
    "avg_price" => %{"value" => 29.99}
  }
}

terms(field, options \\ [])

@spec terms(
  String.t(),
  keyword()
) :: map()

Creates a terms aggregation for grouping documents by field values.

Parameters

  • field: Field name to aggregate on
  • options: Terms aggregation options

Options

  • :size - Maximum number of buckets to return (default: 10)
  • :min_doc_count - Minimum document count for buckets (default: 1)
  • :missing - Value to use for documents missing the field
  • :order - Sort order for buckets

Examples

terms_agg = Aggregation.terms("category", size: 20, min_doc_count: 5)
# Returns: %{"terms" => %{"field" => "category", "size" => 20, "min_doc_count" => 5}}

with_sub_aggregations(aggregation, sub_aggregations)

@spec with_sub_aggregations(map(), map()) :: map()

Adds sub-aggregations to a bucket aggregation.

Parameters

  • aggregation: Base bucket aggregation
  • sub_aggregations: Map of sub-aggregation definitions

Examples

base_agg = Aggregation.terms("category", size: 10)

sub_aggs = %{
  "avg_price" => Aggregation.metric(:avg, "price"),
  "max_rating" => Aggregation.metric(:max, "rating")
}

full_agg = Aggregation.with_sub_aggregations(base_agg, sub_aggs)