onigleam

Package Version Hex Docs

A Gleam library for converting Oniguruma regex patterns to patterns compatible with gleam_regexp. This is hopefully useful for working with TextMate grammars in Gleam, as TextMate uses Oniguruma’s regex syntax for syntax highlighting rules.

Attribution: This library is a Gleam port of oniguruma-to-es and oniguruma-parser by Steven Levithan.

This port was developed with LLM assistance (Claude).

gleam add onigleam@1

Quick Start

import onigleam
import onigleam/options
import gleam/dict
import gleam/regexp

// Convert a TextMate-style pattern with named capture groups
let assert Ok(result) = onigleam.convert(
  "(?<keyword>fn|let|pub)\\s+(?<name>[a-z_]\\w*)"
)

// Named groups become numbered, with a mapping preserved
result.pattern
// "(fn|let|pub)\\s+([a-z_]\\w*)"

dict.get(result.capture_names, "keyword")  // Ok(1)
dict.get(result.capture_names, "name")     // Ok(2)

// Compile and use directly
let assert Ok(re) = onigleam.to_regexp(
  "(?<num>\\d+)",
  options.default_options(),
)
let assert [match] = regexp.scan(re, "value: 42")
match.content  // "42"

Usage

Named Capture Groups

Oniguruma’s named capture groups (?<name>...) are converted to standard numbered groups, since gleam_regexp doesn’t expose named groups. The name-to-number mapping is returned so you can still reference captures by name:

import onigleam
import gleam/dict

let assert Ok(result) = onigleam.convert(
  "(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})"
)

result.pattern
// "(\\d{4})-(\\d{2})-(\\d{2})"

dict.get(result.capture_names, "year")   // Ok(1)
dict.get(result.capture_names, "month")  // Ok(2)
dict.get(result.capture_names, "day")    // Ok(3)

Unicode and Hex Escapes

Oniguruma’s various escape formats are converted to their literal characters:

import onigleam

// Hex escapes
let assert Ok(r1) = onigleam.convert("\\x41\\x42\\x43")
r1.pattern  // "ABC"

// Unicode escapes
let assert Ok(r2) = onigleam.convert("caf\\u00e9")
r2.pattern  // "café"

TextMate Grammar Patterns

TextMate grammars sometimes reference capture groups that don’t exist in the current pattern (orphan backreferences). Use convert_textmate to handle these gracefully:

import onigleam

// This pattern references \1 but has no capture group
// Normal conversion would fail, but convert_textmate allows it
let assert Ok(result) = onigleam.convert_textmate(
  "(['\"]).*?\\1"  // Match quoted strings
)
// Returns Ok with a warning about the orphan backref

Flags and Options

import onigleam
import onigleam/options

// Case-insensitive matching
let assert Ok(result) = onigleam.convert_with_flags(
  "(?<tag>html|body|div)",
  "i"
)
result.regexp_options.case_insensitive  // True

// Full control with options builder
let opts = options.default_options()
  |> options.with_flags("i")
  |> options.allow_orphan_backrefs

let assert Ok(result) = onigleam.to_regexp_details(
  "(?<open><\\w+>).*?(?<close></\\w+>)",
  opts,
)

API Reference

Main Functions

FunctionDescription
convert(pattern)Convert with default options
convert_with_flags(pattern, flags)Convert with Oniguruma flags
convert_textmate(pattern)Convert with TextMate-friendly options
to_regexp(pattern, options)Convert and compile to Regexp
to_regexp_details(pattern, options)Convert with full result details
format_error(error)Format error as human-readable string

ConversionResult

pub type ConversionResult {
  ConversionResult(
    pattern: String,              // Generated pattern string
    regexp_options: Options,      // Options for gleam_regexp
    capture_names: Dict(String, Int),  // Name -> group number mapping
    warnings: List(String),       // Any warnings generated
  )
}

Supported Features

FeatureStatusNotes
Literals, escapesSupportedDirect mapping
Character classes [abc]SupportedIncluding ranges, negation
Quantifiers *, +, ?, {n,m}SupportedGreedy and lazy
Capturing groups (...)SupportedNamed groups converted to numbered
Non-capturing groups (?:...)SupportedDirect mapping
Lookahead (?=...), (?!...)SupportedBoth positive and negative
Lookbehind (?<=...), (?<!...)SupportedBoth positive and negative
Anchors ^, $, \A, \zSupportedDirect mapping
Word boundaries \b, \BSupportedPlatform differences may apply
Character shorthands \d, \w, \sSupportedDirect mapping
Alternation a|bSupportedDirect mapping
Unicode escapes \uHHHHSupportedConverted to literal
Hex escapes \xHHSupportedConverted to literal

Unsupported Features (Will Error)

FeatureWhy
Atomic groups (?>...)Cannot emulate in gleam_regexp
Possessive quantifiers *+, ++Cannot emulate in gleam_regexp
Recursion \g<0>Not supported by underlying engines
Subroutines \g<name>Not supported by underlying engines
Search start \GRequires stateful regex
Absence functions (?~...)Cannot emulate

Partial Support / Workarounds

FeatureHandling
Named capturesConverted to numbered; mapping returned
dotAll mode. replaced with [\s\S] when enabled
Flag modifiers (?i:...)Flags applied during transformation
\K directiveWarning issued; full match returned

Platform Compatibility

This library generates patterns compatible with both:

Run tests on both targets:

gleam test --target erlang
gleam test --target javascript

Error Handling

import onigleam

case onigleam.convert("(?>atomic)") {
  Ok(result) -> use_result(result)
  Error(err) -> {
    let message = onigleam.format_error(err)
    // "Atomic groups are not supported. ..."
  }
}

Development

gleam test
gleam test --target javascript  # Test on JavaScript target
gleam test --target erlang  # Test on Erlang target

Further documentation can be found at https://hexdocs.pm/onigleam.

License

MIT License. See LICENSE for details.

Search Document