Sifter.Query.Parser (Sifter v0.2.0)

View Source

The Parser transforms tokens from Sifter.Query.Lexer into structured Sifter.AST nodes.

This parser implements a recursive descent parser with operator precedence parsing (Pratt parsing) to handle complex boolean expressions, field predicates, and set operations in the Sifter query language.

Features

  • Operator precedence: Handles correct precedence with AND (precedence 20) binding tighter than OR (precedence 10)
  • Boolean logic: Supports AND, OR, and NOT operators with proper associativity
  • Field predicates: Parses field-based comparisons with various operators (:, <, <=, >, >=)
  • Set operations: Handles IN, NOT IN, and ALL operations with list validation
  • Wildcard support: Processes prefix/suffix wildcards (field:prefix*, field:*suffix) for equality operations
  • Grouping: Supports parentheses for explicit precedence control
  • Full-text search: Handles bare terms as full-text search expressions

Parser Architecture

The parser uses a two-phase approach:

  1. Prefix parsing: Handles terms that can appear at the start of expressions (NOT, parentheses, field predicates, bare terms)
  2. Infix parsing: Manages binary operators (AND, OR) with precedence climbing

Precedence Rules

From highest to lowest precedence:

  1. NOT prefix operator (binds to the immediately following term)
  2. AND connector (precedence 20, left-associative)
  3. OR connector (precedence 10, left-associative)

Wildcard Processing

For equality operations (: operator), the parser automatically converts wildcard patterns:

  • field:prefix*%AST.Cmp{op: :starts_with, value: "prefix"}
  • field:*suffix%AST.Cmp{op: :ends_with, value: "suffix"}
  • field:"*literal*"%AST.Cmp{op: :eq, value: "*literal*"} (quoted wildcards are literals)

Wildcards are not allowed in:

  • Relational comparisons (<, <=, >, >=)
  • Set operations (IN, NOT IN, ALL) unless quoted

Error Handling

The parser provides detailed error reporting with the following error types:

  • :unrecognized_token - Invalid token in context (e.g., starting with connector)
  • :unexpected_eof_after_operator - Missing right-hand side after operator
  • :missing_rhs - Missing value after comparison operator
  • :invalid_wildcard_position - Wildcard in invalid position (e.g., middle of term)
  • :wildcard_not_allowed_for_relop - Wildcard used with relational operator
  • :empty_list - Empty parentheses where list expected
  • :trailing_comma_in_list - Trailing comma in list
  • :missing_comma_in_list - Missing comma between list items

Notes

The parser is designed to be used after tokenization by Sifter.Query.Lexer and produces AST nodes that can be consumed by a database query builder or further transformation passes.

Summary

Functions

Parses a list of tokens from Sifter.Query.Lexer into a Sifter.AST tree.

Types

token()

@type token() :: {atom(), binary(), any(), {non_neg_integer(), non_neg_integer()}}

Functions

parse(tokens)

@spec parse([token()]) :: {:ok, Sifter.AST.t()} | {:error, {atom(), token()}}

Parses a list of tokens from Sifter.Query.Lexer into a Sifter.AST tree.

This is the main entry point for the parser. It takes a list of tokens produced by the lexer and transforms them into a structured AST that represents the query's logical structure.

Parameters

Return Values

  • {:ok, ast} - Successfully parsed AST node
  • {:error, {error_type, token}} - Parse error with the problematic token