TantivyEx.MergePolicy (TantivyEx v0.4.1)

View Source

Merge policy configuration for TantivyEx indexes.

Merge policies control when and how index segments are merged together. This is important for index performance and storage efficiency.

Available Policies

  • LogMergePolicy - Default policy that merges segments of similar sizes
  • NoMergePolicy - Never merges segments automatically

Examples

# Create a default log merge policy
{:ok, policy} = TantivyEx.MergePolicy.log_merge_policy()

# Create a custom log merge policy
{:ok, policy} = TantivyEx.MergePolicy.log_merge_policy(%{
  min_num_segments: 4,
  max_docs_before_merge: 5_000_000,
  min_layer_size: 5_000,
  level_log_size: 0.8,
  del_docs_ratio_before_merge: 0.3
})

# Create a no-merge policy for testing
{:ok, policy} = TantivyEx.MergePolicy.no_merge_policy()

# Apply policy to an index writer
TantivyEx.MergePolicy.set_merge_policy(index_writer, policy)

Summary

Functions

Gets information about the current merge policy of an IndexWriter.

Gets the number of searchable segments in an index.

Gets the list of searchable segment IDs from an index.

Creates a new LogMergePolicy with default settings.

Creates a new LogMergePolicy with custom settings.

Manually triggers a merge operation for specific segments.

Creates a NoMergePolicy that never automatically merges segments.

Sets the merge policy for an IndexWriter.

Waits for all merging threads to complete.

Types

log_merge_options()

@type log_merge_options() :: %{
  optional(:min_num_segments) => non_neg_integer(),
  optional(:max_docs_before_merge) => non_neg_integer(),
  optional(:min_layer_size) => non_neg_integer(),
  optional(:level_log_size) => float(),
  optional(:del_docs_ratio_before_merge) => float()
}

merge_policy()

@type merge_policy() :: reference()

Functions

get_merge_policy_info(index_writer)

@spec get_merge_policy_info(reference()) :: {:ok, String.t()} | {:error, term()}

Gets information about the current merge policy of an IndexWriter.

Parameters

  • index_writer - The IndexWriter reference

Returns

  • {:ok, info} - Debug information about the current merge policy
  • {:error, reason} - If getting the info fails

Examples

{:ok, info} = TantivyEx.MergePolicy.get_merge_policy_info(index_writer)
IO.puts(info)

get_num_segments(index)

@spec get_num_segments(reference()) :: {:ok, non_neg_integer()} | {:error, term()}

Gets the number of searchable segments in an index.

Parameters

  • index - The Index reference

Returns

  • {:ok, count} - Number of segments
  • {:error, reason} - If getting the count fails

Examples

{:ok, segment_count} = TantivyEx.MergePolicy.get_num_segments(index)
IO.puts("Index has #{segment_count} segments")

get_searchable_segment_ids(index)

@spec get_searchable_segment_ids(reference()) ::
  {:ok, [String.t()]} | {:error, term()}

Gets the list of searchable segment IDs from an index.

This is useful for understanding the current segment structure and for manual merge operations.

Parameters

  • index - The Index reference

Returns

  • {:ok, segment_ids} - List of segment ID strings
  • {:error, reason} - If getting segment IDs fails

Examples

{:ok, segment_ids} = TantivyEx.MergePolicy.get_searchable_segment_ids(index)
IO.inspect(segment_ids, label: "Segment IDs")

log_merge_policy()

@spec log_merge_policy() :: {:ok, merge_policy()} | {:error, term()}

Creates a new LogMergePolicy with default settings.

LogMergePolicy groups segments into levels based on their size and merges segments within each level when there are enough segments or when the delete ratio exceeds the threshold.

Returns

  • {:ok, policy} - The merge policy reference
  • {:error, reason} - If creation fails

Examples

{:ok, policy} = TantivyEx.MergePolicy.log_merge_policy()

log_merge_policy(options)

@spec log_merge_policy(log_merge_options()) ::
  {:ok, merge_policy()} | {:error, term()}

Creates a new LogMergePolicy with custom settings.

Options

  • :min_num_segments - Minimum number of segments to merge (default: 8)
  • :max_docs_before_merge - Maximum docs in segment before it's excluded from merging (default: 10,000,000)
  • :min_layer_size - Minimum segment size for level grouping (default: 10,000)
  • :level_log_size - Log ratio between consecutive levels (default: 0.75)
  • :del_docs_ratio_before_merge - Delete ratio threshold to trigger merge (default: 1.0)

Returns

  • {:ok, policy} - The merge policy reference
  • {:error, reason} - If creation fails or parameters are invalid

Examples

# More aggressive merging
{:ok, policy} = TantivyEx.MergePolicy.log_merge_policy(%{
  min_num_segments: 4,
  del_docs_ratio_before_merge: 0.2
})

# Less aggressive merging for better write performance
{:ok, policy} = TantivyEx.MergePolicy.log_merge_policy(%{
  min_num_segments: 12,
  max_docs_before_merge: 50_000_000
})

merge_segments(index_writer, segment_ids)

@spec merge_segments(reference(), [String.t()]) :: :ok | {:error, term()}

Manually triggers a merge operation for specific segments.

This allows you to explicitly control which segments get merged, bypassing the merge policy's automatic decisions.

Parameters

  • index_writer - The IndexWriter reference
  • segment_ids - List of segment ID strings to merge

Returns

  • :ok - If the merge was triggered successfully
  • {:error, reason} - If the merge cannot be started

Examples

{:ok, segment_ids} = TantivyEx.Index.get_searchable_segment_ids(index)
:ok = TantivyEx.MergePolicy.merge_segments(index_writer, segment_ids)

no_merge_policy()

@spec no_merge_policy() :: {:ok, merge_policy()} | {:error, term()}

Creates a NoMergePolicy that never automatically merges segments.

This is useful for testing scenarios or when you want complete manual control over segment merging.

Returns

  • {:ok, policy} - The merge policy reference
  • {:error, reason} - If creation fails

Examples

{:ok, policy} = TantivyEx.MergePolicy.no_merge_policy()

set_merge_policy(index_writer, merge_policy)

@spec set_merge_policy(reference(), merge_policy()) :: :ok | {:error, term()}

Sets the merge policy for an IndexWriter.

Parameters

  • index_writer - The IndexWriter reference
  • merge_policy - The merge policy to set

Returns

  • :ok - If the policy was set successfully
  • {:error, reason} - If setting the policy fails

Examples

{:ok, policy} = TantivyEx.MergePolicy.log_merge_policy()
:ok = TantivyEx.MergePolicy.set_merge_policy(index_writer, policy)

wait_merging_threads(index_writer)

@spec wait_merging_threads(reference()) :: :ok | {:error, term()}

Waits for all merging threads to complete.

This is useful when you want to ensure all pending merges are finished before proceeding, such as during testing or before closing an index.

Parameters

  • index_writer - The IndexWriter reference

Returns

  • :ok - If all merging threads completed successfully
  • {:error, reason} - If waiting fails

Examples

:ok = TantivyEx.MergePolicy.wait_merging_threads(index_writer)