Unicode.Set.Operation (Unicode Set v1.5.0)

View Source

Functions to operate on Unicode sets:

  • Intersection
  • Difference
  • Union
  • Inversion

Summary

Functions

Combines all the ranges into a single list

Compact overlapping and adjacent ranges

Returns the complement (inverse) of a set.

Removes one list of 2-tuples representing Unicode codepoints from another.

Expand takes a reduced AST and expands it into a single list of codepoint tuples.

Expand string ranges like {ab}-{cd}

Returns a boolean indicating whether the given AST includes set operations intersection or difference.

Returns the intersection of two lists of 2-tuples representing codepoint ranges.

Reduces all sets, properties and ranges to a list of 2-tuples expressing a range of codepoints.

Returns the difference of two lists of 2-tuples representing codepoint ranges.

Prewalks the expanded AST from a parsed Unicode Set invoking a function on each codepoint range in the set.

Merges two lists of 2-tuples representing ranges of codepoints. The result is a single list of 2-tuple codepoint ranges that includes all codepoint from the two lists.

Functions

combine(other)

Combines all the ranges into a single list

This function is called iff the Unicode Sets are formed by unions only. If the set operations of intersection or difference are present then the ranges will need to be expanded via expand/1.

compact_ranges(ranges)

Compact overlapping and adjacent ranges

complement(set)

Returns the complement (inverse) of a set.

difference(a, b)

Removes one list of 2-tuples representing Unicode codepoints from another.

Returns the first list of codepoint ranges minus the codepoints in the second list.

expand(unicode_set)

Expand takes a reduced AST and expands it into a single list of codepoint tuples.

expand_string_range(arg1)

expand_string_ranges(ranges)

Expand string ranges like {ab}-{cd}

has_difference_or_intersection?(arg1)

Returns a boolean indicating whether the given AST includes set operations intersection or difference.

When these operations exist then all ranges - including ^ ranges needs to be expanded. If there are no intersections or differences then the ^ ranges can be directly translated to guard clauses or a list of elixir ranges.

intersect(a, b)

Returns the intersection of two lists of 2-tuples representing codepoint ranges.

The result is a single list of codepoint ranges that represents the common codepoints in the two lists.

reduce(unicode_set)

Reduces all sets, properties and ranges to a list of 2-tuples expressing a range of codepoints.

It can return one of two forms

[{:in, [tuple_list]}] for an inclusion list

[{:not_in, [tuple_list]}] for an exclusion list

or a combination of both.

Attempts are made to preserve :not_in clauses as long as possible since many uses, like regexes and nimble_parsec can consume :not_in style ranges.

When only single character classes are presented, or several classes which are unions, :not_in can be preserved.

When intersections and differences are required, the rnages must be both reduced and expanded in order for this set operations to complete.

symmetric_difference(this, that)

Returns the difference of two lists of 2-tuples representing codepoint ranges.

The result is a single list of codepoint ranges that represents the codepoints that are in either of the two lists but not both.

traverse(ranges, fun)

Prewalks the expanded AST from a parsed Unicode Set invoking a function on each codepoint range in the set.

traverse(range, var, fun)

union(a_list, b_list)

Merges two lists of 2-tuples representing ranges of codepoints. The result is a single list of 2-tuple codepoint ranges that includes all codepoint from the two lists.