dataprep/rules

Types

Why a checked regex constructor refused to build a validator.

Returned by matches_string_checked and matches_fully_string_checked instead of panicking, so the caller controls how a malformed pattern surfaces. Mirrors the information from regexp.CompileError without forcing the caller to depend on gleam/regexp directly.

pub type RegexRuleError {
  InvalidPattern(reason: String, byte_index: Int)
}

Constructors

  • InvalidPattern(reason: String, byte_index: Int)

Values

pub fn equals(
  expected expected: a,
  error error: e,
) -> fn(a) -> validated.Validated(a, e)

Fails if the value does not equal the expected value.

pub fn length_between(
  minimum min: Int,
  maximum max: Int,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the string length is outside [min, max].

Edge case — min > max: the resulting validator is vacuously unsatisfiable (no string length can be both >= min and <= max), so every input fails with error. This is a programmer error rather than a runtime condition; the library does not raise it because rule constructors are pure and never panic. Construct the rule yourself with a sane range, or guard the bounds at the call site (e.g., case min <= max { True -> ...; False -> ... }) when min/max come from configuration or other dynamic input.

pub fn matches(
  pattern re: regexp.Regexp,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the regex does not find a match anywhere in the string. This is regexp.check semantics — a partial / substring match is enough to pass.

Anchoring is the caller’s responsibility. A pattern like [0-9]+ accepts "abc123def" because the digit run matches somewhere in the input. To require the entire string to match, either anchor the pattern explicitly with ^...$, or reach for matches_fully which enforces full-string semantics regardless of whether the pattern is anchored. The validation use case almost always wants matches_fully; matches is exposed for the cases that genuinely need substring search.

Takes a pre-compiled Regexp so a malformed pattern surfaces as a regexp.from_string error at the call site instead of crashing inside the validator.

Example: import gleam/regexp import dataprep/rules

let assert Ok(re) = regexp.from_string(“^[a-z]+$”) let check = rules.matches(pattern: re, error: InvalidFormat)

For literal patterns where propagating a compile error to the caller is not useful, see matches_string.

pub fn matches_fully(
  pattern re: regexp.Regexp,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the regex does not match the entire input string. Unlike matches, a partial / substring hit is not enough — the matched substring must equal the whole input.

Caveat — alternation: because this function only has the already-compiled Regexp, it cannot re-anchor the pattern internally. The check is implemented by inspecting the first match returned by regexp.scan, which is leftmost-first. For patterns that use top-level alternation, the engine may pick a shorter alternative even though a longer one would also match the full input — e.g. a|ab on "ab" selects a and the validator then reports Invalid even though re.fullmatch would accept it. If your pattern uses |, prefer matches_fully_string / matches_fully_string_checked (which anchor the source pattern with ^(?:...)$ before compiling), or anchor the pattern explicitly yourself before passing it in.

For non-alternating patterns this matches Python’s re.fullmatch semantics: anchoring with ^...$ is therefore not required, and patterns that already include ^ / $ continue to work — the anchors just become redundant.

Example: import gleam/regexp import dataprep/rules

let assert Ok(re) = regexp.from_string(“[0-9]+”) let check = rules.matches_fully(pattern: re, error: NotANumber)

check(“123”) // Valid(“123”) check(“abc123def”) // Invalid([NotANumber]) – substring match rejected

pub fn matches_fully_string(
  pattern pattern: String,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the regex does not match the entire input string. Compiles the pattern internally with explicit anchors (^(?:pattern)$) so the check matches Python’s re.fullmatch semantics even for top-level alternation — e.g. pattern "a|ab" against input "ab" is accepted because ab is one of the alternatives. An invalid pattern panics at construction time with the underlying compile error.

Use this for the validation use case — "[0-9]+" will reject "abc123def" rather than accepting it on a substring hit, so the API behaves the way readers usually assume.

The predicate compares regexp.scan match content against the input rather than relying on regexp.check. The latter would diverge between targets for inputs with a trailing newline (Erlang $ matches before a final newline by default; JavaScript $ only matches at the absolute end). Comparing match content against the input length pins the contract on both runtimes — e.g. pattern "foo" rejects "foo\n" on Erlang and JavaScript alike.

Example: import dataprep/rules

let check = rules.matches_fully_string( pattern: “[a-z0-9-]+”, error: InvalidFormat, )

pub fn matches_fully_string_checked(
  pattern pattern: String,
  error error: e,
) -> Result(
  fn(String) -> validated.Validated(String, e),
  RegexRuleError,
)

Like matches_fully_string, but returns the compile failure as a Result instead of panicking. Use this for the validation use case when the pattern comes from configuration or any other dynamic source where a compile failure should be handled rather than crashing.

On success the validator behaves identically to matches_fully_string: it anchors the pattern as ^(?:pattern)$ internally and matches Python re.fullmatch semantics even for top-level alternation. The byte index reported on a compile failure refers to the position inside the caller-supplied pattern (the internal ^(?: prefix is stripped off before reporting).

Example: import dataprep/rules

case rules.matches_fully_string_checked( pattern: pattern_from_admin, error: BadFormat, ) { Ok(check) -> … Error(rules.InvalidPattern(reason: r, ..)) -> … }

pub fn matches_string(
  pattern pattern: String,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the string does not match the given regular expression pattern. Compiles the pattern internally; an invalid pattern panics at construction time with the underlying compile error.

Same anchoring footgun as matches: a pattern like "[0-9]+" accepts "abc123def" because the digit run matches somewhere. Anchor explicitly with ^...$ or use matches_fully_string for the validation case.

Use this when the pattern is a literal known at the call site and a compile failure would be a programmer error there is no meaningful recovery from. For dynamically-supplied patterns, use matches together with regexp.from_string so the Result is visible.

Example: import dataprep/rules

let check = rules.matches_string( pattern: “^[a-z0-9-]+$”, error: InvalidFormat, )

pub fn matches_string_checked(
  pattern pattern: String,
  error error: e,
) -> Result(
  fn(String) -> validated.Validated(String, e),
  RegexRuleError,
)

Like matches_string, but returns the compile failure as a Result instead of panicking. Use this when the pattern is not a hard-coded literal — config files, admin-supplied input, or anywhere a malformed pattern is a recoverable runtime condition rather than a programmer error.

On success the validator behaves identically to matches(pattern: re, error:) over the compiled pattern, i.e. uses regexp.check semantics (substring hit is enough).

Example: import dataprep/rules import gleam/result

case rules.matches_string_checked( pattern: pattern_from_config, error: InvalidFormat, ) { Ok(check) -> handle_request(check) Error(rules.InvalidPattern(reason: r, ..)) -> reject_config(r) }

pub fn max_float(
  maximum max: Float,
  error error: e,
) -> fn(Float) -> validated.Validated(Float, e)

Fails if the float exceeds max.

pub fn max_int(
  maximum max: Int,
  error error: e,
) -> fn(Int) -> validated.Validated(Int, e)

Fails if the int exceeds max.

pub fn max_length(
  maximum max: Int,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the string length exceeds max.

pub fn min_float(
  minimum min: Float,
  error error: e,
) -> fn(Float) -> validated.Validated(Float, e)

Fails if the float is less than min.

pub fn min_int(
  minimum min: Int,
  error error: e,
) -> fn(Int) -> validated.Validated(Int, e)

Fails if the int is less than min.

pub fn min_length(
  minimum min: Int,
  error error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the string length is less than min.

pub fn non_negative_float(
  error: e,
) -> fn(Float) -> validated.Validated(Float, e)

Fails if the float is negative (less than 0.0).

pub fn non_negative_int(
  error: e,
) -> fn(Int) -> validated.Validated(Int, e)

Fails if the int is negative (less than 0).

pub fn not_blank(
  error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the string is empty or contains only whitespace. Unlike not_empty, this rejects " " and "\t\n". The value is NOT trimmed – it is returned unchanged on success.

pub fn not_empty(
  error: e,
) -> fn(String) -> validated.Validated(String, e)

Fails if the string is exactly “”. Whitespace-only strings like “ “ pass this check. To reject whitespace-only input, compose with prep.trim() first:

raw |> prep.trim() |> rules.not_empty(MyError)

pub fn one_of(
  allowed allowed: List(a),
  error error: e,
) -> fn(a) -> validated.Validated(a, e)

Fails if the value is not in the allowed list.

Edge case — empty allowed list: a set-membership check against the empty set has no inhabitants, so every input fails with error. This is a programmer error rather than a runtime condition; the library does not raise it because rule constructors are pure and never panic. Either construct the rule with a non-empty allowlist, or guard at the call site (e.g., case allowed { [] -> ...; [_, ..] -> rules.one_of(allowed, e) }) when the allowlist comes from configuration or other dynamic input.

Search Document