dataprep

CI Hex.pm

dataprep_logo

Composable, type-driven preprocessing and validation combinator library for Gleam.

dataprep is a combinator toolkit, not a rule catalog.

Requirements

Install

gleam add dataprep

Quick start

import dataprep/prep
import dataprep/validated.{type Validated}
import dataprep/rules

pub type User {
  User(name: String, age: Int)
}

pub type Err {
  NameEmpty
  AgeTooYoung
}

pub fn validate_user(name: String, age: Int) -> Validated(User, Err) {
  let clean = prep.trim() |> prep.then(prep.lowercase())
  let check_name = rules.not_empty(NameEmpty)
  let check_age = rules.min_int(0, AgeTooYoung)

  validated.map2(
    User,
    name |> clean |> check_name,
    check_age(age),
  )
}

// validate_user("  Alice ", 25)   -> Valid(User("alice", 25))
// validate_user("", -1)           -> Invalid([NameEmpty, AgeTooYoung])

Examples

Field validation with structured error context

Attach field names to errors so callers can identify which field failed.

import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
import dataprep/validator

pub type FormError {
  Field(name: String, detail: FieldDetail)
}

pub type FieldDetail {
  Empty
  TooShort(min: Int)
  TooLong(max: Int)
}

pub fn validate_username(raw: String) -> Validated(String, FormError) {
  let clean = prep.trim() |> prep.then(prep.lowercase())
  let check =
    rules.not_empty(Empty)
    |> validator.guard(
      rules.min_length(3, TooShort(3))
      |> validator.both(rules.max_length(20, TooLong(20))),
    )
    |> validator.label("username", Field)

  raw |> clean |> check
}

// validate_username("  Al  ")
//   -> Invalid([Field("username", TooShort(3))])
// validate_username("  Alice  ")
//   -> Valid("alice")

Parse then validate

Use validated.and_then to bridge type-changing parsing with same-type validation. Parsing short-circuits; validation accumulates.

import dataprep/parse
import dataprep/rules
import dataprep/validated.{type Validated}
import dataprep/validator

pub type AgeError {
  NotAnInteger(raw: String)
  TooYoung(min: Int)
  TooOld(max: Int)
}

pub fn validate_age(raw: String) -> Validated(Int, AgeError) {
  let check_range =
    rules.min_int(0, TooYoung(0))
    |> validator.both(rules.max_int(150, TooOld(150)))

  parse.int(raw, NotAnInteger)
  |> validated.and_then(check_range)
}

// validate_age("abc") -> Invalid([NotAnInteger("abc")])
// validate_age("200") -> Invalid([TooOld(150)])
// validate_age("25")  -> Valid(25)

Nested error labeling with map3

Combine multiple fields into a domain type. All errors from all fields are accumulated with their field names.

import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
import dataprep/validator

pub type SignupForm {
  SignupForm(name: String, email: String, age: Int)
}

pub type SignupError {
  Field(name: String, detail: Detail)
}

pub type Detail {
  Empty
  TooShort(min: Int)
  OutOfRange(min: Int, max: Int)
}

fn validate_name(raw: String) -> Validated(String, SignupError) {
  let clean = prep.trim() |> prep.then(prep.lowercase())
  let check =
    rules.not_empty(Empty)
    |> validator.guard(rules.min_length(2, TooShort(2)))
    |> validator.label("name", Field)
  raw |> clean |> check
}

fn validate_email(raw: String) -> Validated(String, SignupError) {
  let clean = prep.trim() |> prep.then(prep.lowercase())
  let check =
    rules.not_empty(Empty)
    |> validator.label("email", Field)
  raw |> clean |> check
}

fn validate_age(age: Int) -> Validated(Int, SignupError) {
  let check =
    rules.min_int(0, OutOfRange(0, 150))
    |> validator.both(rules.max_int(150, OutOfRange(0, 150)))
    |> validator.label("age", Field)
  check(age)
}

pub fn validate_signup(
  name: String,
  email: String,
  age: Int,
) -> Validated(SignupForm, SignupError) {
  validated.map3(
    SignupForm,
    validate_name(name),
    validate_email(email),
    validate_age(age),
  )
}

// validate_signup("", "", 200)
//   -> Invalid([
//        Field("name", Empty),
//        Field("email", Empty),
//        Field("age", OutOfRange(0, 150)),
//      ])

Pattern matching with rules.matches / matches_string

matches and matches_string use regexp.check semantics — they pass as long as the pattern hits anywhere in the input. A pattern like [0-9]+ will accept "abc123def" because the digit run matches a substring. For the validation case ("the whole string must look like an email / slug / number"), use the matches_fully / matches_fully_string siblings, which compare the matched span against the entire input.

Use matches when the regex is dynamic (built from user input or config) — the regexp.from_string Result stays visible. Use matches_string when the pattern is a literal at the call site: the helper compiles internally and panics on a malformed literal, which is a programmer error there is no useful recovery from.

import dataprep/rules
import dataprep/validated.{type Validated}
import gleam/regexp
import gleam/result

pub type TagError {
  BadFormat
}

// Literal pattern with full-match semantics — the convenience
// helper compiles once at construction. No `let assert Ok(_)`
// boilerplate at the call site, and a substring hit on a partial
// pattern (like `[a-z0-9-]+`) does NOT silently slip through.
pub fn validate_tag(raw: String) -> Validated(String, TagError) {
  let check =
    rules.matches_fully_string(pattern: "[a-z0-9-]+", error: BadFormat)
  check(raw)
}

// Dynamic pattern — the caller controls the compile error.
pub fn validate_with(
  raw: String,
  pattern: String,
) -> Result(Validated(String, TagError), regexp.CompileError) {
  use re <- result.map(regexp.from_string(pattern))
  rules.matches(pattern: re, error: BadFormat)(raw)
}

// validate_tag("ok-1") -> Valid("ok-1")
// validate_tag("BAD!") -> Invalid([BadFormat])

More examples are available in the doc/recipes/ directory of the repository.

Modules

ModuleResponsibility
dataprep/prepInfallible transformations: trim, lowercase, uppercase, collapse_space, replace, default. Compose with then or sequence.
dataprep/validatorChecks without transformation: check, predicate, both, all, alt, guard, map_error, label, each, optional.
dataprep/validatedApplicative error accumulation: map, map_error, and_then, from_result, from_result_map, to_result, map2..map5, sequence, traverse, traverse_indexed.
dataprep/non_empty_listAt-least-one guarantee for error lists: single, cons, append, concat, map, flat_map, to_list, from_list.
dataprep/rulesBuilt-in rules: not_empty, not_blank, matches, matches_string, matches_fully, matches_fully_string, min_length, max_length, length_between, min_int, max_int, min_float, max_float, non_negative_int, non_negative_float, one_of, equals.
dataprep/parseParse helpers: int, float. Bridge String to typed Validated with custom error mapping.

Composition overview

PhaseCombinatorErrorsWhen to use
Prepprep.then(none)Chain infallible transforms
Validatevalidator.both / allAccumulate allIndependent checks on same value
Validatevalidator.altAccumulate on full failureAccept alternative forms
Validatevalidator.guardShort-circuitSkip if prerequisite fails
Combinevalidated.map2..map5Accumulate allBuild domain types from independent fields
Bridgevalidated.and_thenShort-circuitParse then validate (type changes)
Bridgeparse.int / parse.floatShort-circuitString to typed Validated in one step
Bridgeraw |> prep |> validator(prep has none)Apply infallible transform before validation
Collectionvalidated.sequence / traverseAccumulate allValidate a list of values
Collectionvalidator.eachAccumulate allApply a validator to every list element
Collectionvalidator.optional(none if None)Skip validation for absent values

Development

This project uses mise to manage Gleam and Erlang versions, and just as a task runner.

mise install    # install Gleam and Erlang
just ci         # format check, typecheck, build, test
just test       # gleam test
just format     # gleam format
just check      # all checks without deps download

Contributing

Contributions are welcome. See CONTRIBUTING.md for details.

License

MIT

Search Document