dataprep/rules
Types
Why a checked regex constructor refused to build a validator.
Returned by matches_string_checked and
matches_fully_string_checked instead of panicking, so the
caller controls how a malformed pattern surfaces. Mirrors the
information from regexp.CompileError without forcing the
caller to depend on gleam/regexp directly.
pub type RegexRuleError {
InvalidPattern(reason: String, byte_index: Int)
}
Constructors
-
InvalidPattern(reason: String, byte_index: Int)
Values
pub fn equals(
expected expected: a,
error error: e,
) -> fn(a) -> validated.Validated(a, e)
Fails if the value does not equal the expected value.
pub fn length_between(
minimum min: Int,
maximum max: Int,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the string length is outside [min, max].
Edge case — min > max: the resulting validator is vacuously
unsatisfiable (no string length can be both >= min and <= max),
so every input fails with error. This is a programmer error
rather than a runtime condition; the library does not raise it
because rule constructors are pure and never panic. Construct the
rule yourself with a sane range, or guard the bounds at the call
site (e.g., case min <= max { True -> ...; False -> ... }) when
min/max come from configuration or other dynamic input.
pub fn matches(
pattern re: regexp.Regexp,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the regex does not find a match anywhere in the string.
This is regexp.check semantics — a partial / substring match
is enough to pass.
Anchoring is the caller’s responsibility. A pattern like
[0-9]+ accepts "abc123def" because the digit run matches
somewhere in the input. To require the entire string to match,
either anchor the pattern explicitly with ^...$, or reach for
matches_fully which enforces full-string semantics regardless
of whether the pattern is anchored. The validation use case
almost always wants matches_fully; matches is exposed for
the cases that genuinely need substring search.
Takes a pre-compiled Regexp so a malformed pattern surfaces as
a regexp.from_string error at the call site instead of
crashing inside the validator.
Example: import gleam/regexp import dataprep/rules
let assert Ok(re) = regexp.from_string(“^[a-z]+$”) let check = rules.matches(pattern: re, error: InvalidFormat)
For literal patterns where propagating a compile error to the
caller is not useful, see matches_string.
pub fn matches_fully(
pattern re: regexp.Regexp,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the regex does not match the entire input string. Unlike
matches, a partial / substring hit is not enough — the
matched substring must equal the whole input.
Caveat — alternation: because this function only has the
already-compiled Regexp, it cannot re-anchor the pattern
internally. The check is implemented by inspecting the first
match returned by regexp.scan, which is leftmost-first. For
patterns that use top-level alternation, the engine may pick a
shorter alternative even though a longer one would also match
the full input — e.g. a|ab on "ab" selects a and the
validator then reports Invalid even though re.fullmatch
would accept it. If your pattern uses |, prefer
matches_fully_string / matches_fully_string_checked (which
anchor the source pattern with ^(?:...)$ before compiling), or
anchor the pattern explicitly yourself before passing it in.
For non-alternating patterns this matches Python’s re.fullmatch
semantics: anchoring with ^...$ is therefore not required, and
patterns that already include ^ / $ continue to work — the
anchors just become redundant.
Example: import gleam/regexp import dataprep/rules
let assert Ok(re) = regexp.from_string(“[0-9]+”) let check = rules.matches_fully(pattern: re, error: NotANumber)
check(“123”) // Valid(“123”) check(“abc123def”) // Invalid([NotANumber]) – substring match rejected
pub fn matches_fully_string(
pattern pattern: String,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the regex does not match the entire input string.
Compiles the pattern internally with explicit anchors
(^(?:pattern)$) so the check matches Python’s re.fullmatch
semantics even for top-level alternation — e.g. pattern "a|ab"
against input "ab" is accepted because ab is one of the
alternatives. An invalid pattern panics at construction time
with the underlying compile error.
Use this for the validation use case — "[0-9]+" will reject
"abc123def" rather than accepting it on a substring hit, so
the API behaves the way readers usually assume.
The predicate compares regexp.scan match content against the
input rather than relying on regexp.check. The latter would
diverge between targets for inputs with a trailing newline
(Erlang $ matches before a final newline by default;
JavaScript $ only matches at the absolute end). Comparing
match content against the input length pins the contract on
both runtimes — e.g. pattern "foo" rejects "foo\n" on
Erlang and JavaScript alike.
Example: import dataprep/rules
let check = rules.matches_fully_string( pattern: “[a-z0-9-]+”, error: InvalidFormat, )
pub fn matches_fully_string_checked(
pattern pattern: String,
error error: e,
) -> Result(
fn(String) -> validated.Validated(String, e),
RegexRuleError,
)
Like matches_fully_string, but returns the compile failure as
a Result instead of panicking. Use this for the validation use
case when the pattern comes from configuration or any other
dynamic source where a compile failure should be handled rather
than crashing.
On success the validator behaves identically to
matches_fully_string: it anchors the pattern as
^(?:pattern)$ internally and matches Python re.fullmatch
semantics even for top-level alternation. The byte index
reported on a compile failure refers to the position inside the
caller-supplied pattern (the internal ^(?: prefix is stripped
off before reporting).
Example: import dataprep/rules
case rules.matches_fully_string_checked( pattern: pattern_from_admin, error: BadFormat, ) { Ok(check) -> … Error(rules.InvalidPattern(reason: r, ..)) -> … }
pub fn matches_string(
pattern pattern: String,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the string does not match the given regular expression pattern. Compiles the pattern internally; an invalid pattern panics at construction time with the underlying compile error.
Same anchoring footgun as matches: a pattern like
"[0-9]+" accepts "abc123def" because the digit run matches
somewhere. Anchor explicitly with ^...$ or use
matches_fully_string for the validation case.
Use this when the pattern is a literal known at the call site
and a compile failure would be a programmer error there is no
meaningful recovery from. For dynamically-supplied patterns,
use matches together with regexp.from_string so the
Result is visible.
Example: import dataprep/rules
let check = rules.matches_string( pattern: “^[a-z0-9-]+$”, error: InvalidFormat, )
pub fn matches_string_checked(
pattern pattern: String,
error error: e,
) -> Result(
fn(String) -> validated.Validated(String, e),
RegexRuleError,
)
Like matches_string, but returns the compile failure as a
Result instead of panicking. Use this when the pattern is not
a hard-coded literal — config files, admin-supplied input, or
anywhere a malformed pattern is a recoverable runtime condition
rather than a programmer error.
On success the validator behaves identically to
matches(pattern: re, error:) over the compiled pattern, i.e.
uses regexp.check semantics (substring hit is enough).
Example: import dataprep/rules import gleam/result
case rules.matches_string_checked( pattern: pattern_from_config, error: InvalidFormat, ) { Ok(check) -> handle_request(check) Error(rules.InvalidPattern(reason: r, ..)) -> reject_config(r) }
pub fn max_float(
maximum max: Float,
error error: e,
) -> fn(Float) -> validated.Validated(Float, e)
Fails if the float exceeds max.
pub fn max_int(
maximum max: Int,
error error: e,
) -> fn(Int) -> validated.Validated(Int, e)
Fails if the int exceeds max.
pub fn max_length(
maximum max: Int,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the string length exceeds max.
pub fn min_float(
minimum min: Float,
error error: e,
) -> fn(Float) -> validated.Validated(Float, e)
Fails if the float is less than min.
pub fn min_int(
minimum min: Int,
error error: e,
) -> fn(Int) -> validated.Validated(Int, e)
Fails if the int is less than min.
pub fn min_length(
minimum min: Int,
error error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the string length is less than min.
pub fn non_negative_float(
error: e,
) -> fn(Float) -> validated.Validated(Float, e)
Fails if the float is negative (less than 0.0).
pub fn non_negative_int(
error: e,
) -> fn(Int) -> validated.Validated(Int, e)
Fails if the int is negative (less than 0).
pub fn not_blank(
error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the string is empty or contains only whitespace.
Unlike not_empty, this rejects " " and "\t\n".
The value is NOT trimmed – it is returned unchanged on success.
pub fn not_empty(
error: e,
) -> fn(String) -> validated.Validated(String, e)
Fails if the string is exactly “”. Whitespace-only strings like “ “ pass this check. To reject whitespace-only input, compose with prep.trim() first:
raw |> prep.trim() |> rules.not_empty(MyError)
pub fn one_of(
allowed allowed: List(a),
error error: e,
) -> fn(a) -> validated.Validated(a, e)
Fails if the value is not in the allowed list.
Edge case — empty allowed list: a set-membership check against
the empty set has no inhabitants, so every input fails with
error. This is a programmer error rather than a runtime
condition; the library does not raise it because rule constructors
are pure and never panic. Either construct the rule with a
non-empty allowlist, or guard at the call site (e.g.,
case allowed { [] -> ...; [_, ..] -> rules.one_of(allowed, e) })
when the allowlist comes from configuration or other dynamic
input.