TantivyEx.Aggregation (TantivyEx v0.4.1)
View SourceComprehensive aggregation functionality for TantivyEx with Elasticsearch-compatible API.
This module provides a complete aggregation system supporting:
- Bucket aggregations (terms, histogram, date_histogram, range)
- Metric aggregations (avg, min, max, sum, count, stats, percentiles)
- Nested/sub-aggregations
- Elasticsearch-compatible JSON request/response format
- Advanced aggregation options and configurations
Features
Bucket Aggregations
- Terms: Group documents by field values
- Histogram: Group numeric values into buckets with fixed intervals
- Date Histogram: Group date values into time-based buckets
- Range: Group documents into custom value ranges
Metric Aggregations
- Average: Calculate average value of a numeric field
- Min/Max: Find minimum/maximum values
- Sum: Calculate sum of numeric field values
- Count: Count documents (value count aggregation)
- Stats: Calculate min, max, sum, count, and average in one aggregation
- Percentiles: Calculate percentile values (50th, 95th, 99th, etc.)
Advanced Features
- Nested Aggregations: Add sub-aggregations to bucket aggregations
- Memory Optimization: Built-in memory limits and performance optimizations
- Elasticsearch Compatibility: Request/response format matches Elasticsearch
- Error Handling: Comprehensive validation and error reporting
Usage Examples
# Simple terms aggregation
aggregations = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
}
}
}
{:ok, result} = Aggregation.run(searcher, query, aggregations)
# Histogram with sub-aggregation
aggregations = %{
"price_histogram" => %{
"histogram" => %{
"field" => "price",
"interval" => 10.0
},
"aggs" => %{
"avg_rating" => %{
"avg" => %{
"field" => "rating"
}
}
}
}
}
# Date histogram
aggregations = %{
"sales_over_time" => %{
"date_histogram" => %{
"field" => "timestamp",
"calendar_interval" => "month"
}
}
}
# Range aggregation
aggregations = %{
"price_ranges" => %{
"range" => %{
"field" => "price",
"ranges" => [
%{"to" => 50},
%{"from" => 50, "to" => 100},
%{"from" => 100}
]
}
}
}
# Multiple aggregations
aggregations = %{
"avg_price" => %{
"avg" => %{"field" => "price"}
},
"max_price" => %{
"max" => %{"field" => "price"}
},
"price_stats" => %{
"stats" => %{"field" => "price"}
}
}
# Search with aggregations
{:ok, result} = Aggregation.search_with_aggregations(searcher, query, aggregations, 20)
Summary
Functions
Creates a complete aggregation request with multiple aggregations.
Creates a date histogram aggregation for grouping date values into time-based buckets.
Creates a histogram aggregation for grouping numeric values into fixed-interval buckets.
Creates a metric aggregation for calculating statistics on numeric fields.
Creates a range aggregation for grouping documents into custom value ranges.
Runs aggregations on search results without returning documents.
Runs a search query with aggregations, returning both hits and aggregation results.
Creates a terms aggregation for grouping documents by field values.
Adds sub-aggregations to a bucket aggregation.
Types
@type aggregation_options() :: [ validate: boolean(), memory_limit: pos_integer(), timeout: pos_integer() ]
@type aggregation_request() :: map()
@type aggregation_result() :: map()
Functions
Creates a complete aggregation request with multiple aggregations.
Parameters
aggregations: Map or keyword list of aggregation definitions
Examples
aggs = Aggregation.build_request([
{"categories", Aggregation.terms("category", size: 20)},
{"avg_price", Aggregation.metric(:avg, "price")},
{"price_histogram", Aggregation.histogram("price", 10.0)}
])
# Or with a map
aggs = Aggregation.build_request(%{
"categories" => Aggregation.terms("category"),
"stats" => Aggregation.metric(:stats, "price")
})
Creates a date histogram aggregation for grouping date values into time-based buckets.
Parameters
field: Date field name to aggregate oninterval: Time interval (e.g., "day", "month", "year", "1h", "30m")options: Date histogram aggregation options
Options
:min_doc_count- Minimum document count for buckets (default: 1):keyed- Return buckets as a map instead of array (default: false):time_zone- Time zone for bucket calculation:format- Date format for bucket keys
Examples
date_hist = Aggregation.date_histogram("timestamp", "month")
# Returns: %{"date_histogram" => %{"field" => "timestamp", "calendar_interval" => "month"}}
hourly_hist = Aggregation.date_histogram("created_at", "1h", time_zone: "America/New_York")
Creates a histogram aggregation for grouping numeric values into fixed-interval buckets.
Parameters
field: Numeric field name to aggregate oninterval: Bucket interval sizeoptions: Histogram aggregation options
Options
:min_doc_count- Minimum document count for buckets (default: 1):keyed- Return buckets as a map instead of array (default: false)
Examples
hist_agg = Aggregation.histogram("price", 10.0, min_doc_count: 2)
# Returns: %{"histogram" => %{"field" => "price", "interval" => 10.0, "min_doc_count" => 2}}
Creates a metric aggregation for calculating statistics on numeric fields.
Parameters
type: Type of metric (:avg, :min, :max, :sum, :count, :stats, :percentiles)field: Field name to calculate metrics onoptions: Metric-specific options
Metric Types
:avg- Average value:min- Minimum value:max- Maximum value:sum- Sum of all values:count- Count of values:stats- All basic statistics (min, max, avg, sum, count):percentiles- Percentile calculations
Options for :percentiles
:percents- List of percentiles to calculate (default: [1, 5, 25, 50, 75, 95, 99]):keyed- Return as map instead of array (default: true)
Examples
avg_agg = Aggregation.metric(:avg, "price")
# Returns: %{"avg" => %{"field" => "price"}}
stats_agg = Aggregation.metric(:stats, "rating")
# Returns: %{"stats" => %{"field" => "rating"}}
percentiles_agg = Aggregation.metric(:percentiles, "response_time", percents: [50, 95, 99])
Creates a range aggregation for grouping documents into custom value ranges.
Parameters
field: Numeric field name to aggregate onranges: List of range specificationsoptions: Range aggregation options
Range Specifications
Each range can have:
:from- Lower bound (inclusive):to- Upper bound (exclusive):key- Custom key name for the bucket
Options
:keyed- Return buckets as a map instead of array (default: false)
Examples
ranges = [
%{"to" => 50},
%{"from" => 50, "to" => 100, "key" => "medium"},
%{"from" => 100}
]
range_agg = Aggregation.range("price", ranges)
# Using helper
range_agg = Aggregation.range("price", [
{nil, 50},
{50, 100, "medium"},
{100, nil}
])
@spec run(term(), term(), aggregation_request(), aggregation_options()) :: {:ok, aggregation_result()} | {:error, String.t()}
Runs aggregations on search results without returning documents.
Parameters
searcher: SearcherResource from TantivyEx.Searcherquery: QueryResource from TantivyEx.Queryaggregations: Map of aggregation definitionsoptions: Aggregation options (optional)
Returns
{:ok, aggregation_results}on success{:error, reason}on failure
Examples
aggregations = %{
"categories" => %{
"terms" => %{
"field" => "category",
"size" => 10
}
}
}
{:ok, results} = Aggregation.run(searcher, query, aggregations)
# Results format:
%{
"categories" => %{
"doc_count_error_upper_bound" => 0,
"sum_other_doc_count" => 0,
"buckets" => [
%{"key" => "electronics", "doc_count" => 150},
%{"key" => "books", "doc_count" => 89}
]
}
}
@spec search_with_aggregations( term(), term(), aggregation_request(), non_neg_integer(), aggregation_options() ) :: {:ok, map()} | {:error, String.t()}
Runs a search query with aggregations, returning both hits and aggregation results.
Parameters
searcher: SearcherResource from TantivyEx.Searcherquery: QueryResource from TantivyEx.Queryaggregations: Map of aggregation definitionssearch_limit: Maximum number of documents to return (default: 10)options: Aggregation options (optional)
Returns
{:ok, %{hits: search_results, aggregations: aggregation_results}}on success{:error, reason}on failure
Examples
aggregations = %{
"avg_price" => %{
"avg" => %{"field" => "price"}
}
}
{:ok, result} = Aggregation.search_with_aggregations(searcher, query, aggregations, 20)
# Result format:
%{
"hits" => %{
"total" => 150,
"hits" => [
%{"score" => 1.5, "doc_id" => 1, "title" => "Product 1", ...},
...
]
},
"aggregations" => %{
"avg_price" => %{"value" => 29.99}
}
}
Creates a terms aggregation for grouping documents by field values.
Parameters
field: Field name to aggregate onoptions: Terms aggregation options
Options
:size- Maximum number of buckets to return (default: 10):min_doc_count- Minimum document count for buckets (default: 1):missing- Value to use for documents missing the field:order- Sort order for buckets
Examples
terms_agg = Aggregation.terms("category", size: 20, min_doc_count: 5)
# Returns: %{"terms" => %{"field" => "category", "size" => 20, "min_doc_count" => 5}}
Adds sub-aggregations to a bucket aggregation.
Parameters
aggregation: Base bucket aggregationsub_aggregations: Map of sub-aggregation definitions
Examples
base_agg = Aggregation.terms("category", size: 10)
sub_aggs = %{
"avg_price" => Aggregation.metric(:avg, "price"),
"max_rating" => Aggregation.metric(:max, "rating")
}
full_agg = Aggregation.with_sub_aggregations(base_agg, sub_aggs)