onigleam
A Gleam library for converting Oniguruma regex patterns to patterns compatible with gleam_regexp. This is hopefully useful for working with TextMate grammars in Gleam, as TextMate uses Oniguruma’s regex syntax for syntax highlighting rules.
Attribution: This library is a Gleam port of oniguruma-to-es and oniguruma-parser by Steven Levithan.
This port was developed with LLM assistance (Claude).
gleam add onigleam@1
Quick Start
import onigleam
import onigleam/options
import gleam/dict
import gleam/regexp
// Convert a TextMate-style pattern with named capture groups
let assert Ok(result) = onigleam.convert(
"(?<keyword>fn|let|pub)\\s+(?<name>[a-z_]\\w*)"
)
// Named groups become numbered, with a mapping preserved
result.pattern
// "(fn|let|pub)\\s+([a-z_]\\w*)"
dict.get(result.capture_names, "keyword") // Ok(1)
dict.get(result.capture_names, "name") // Ok(2)
// Compile and use directly
let assert Ok(re) = onigleam.to_regexp(
"(?<num>\\d+)",
options.default_options(),
)
let assert [match] = regexp.scan(re, "value: 42")
match.content // "42"
Usage
Named Capture Groups
Oniguruma’s named capture groups (?<name>...) are converted to standard numbered groups, since gleam_regexp doesn’t expose named groups. The name-to-number mapping is returned so you can still reference captures by name:
import onigleam
import gleam/dict
let assert Ok(result) = onigleam.convert(
"(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})"
)
result.pattern
// "(\\d{4})-(\\d{2})-(\\d{2})"
dict.get(result.capture_names, "year") // Ok(1)
dict.get(result.capture_names, "month") // Ok(2)
dict.get(result.capture_names, "day") // Ok(3)
Unicode and Hex Escapes
Oniguruma’s various escape formats are converted to their literal characters:
import onigleam
// Hex escapes
let assert Ok(r1) = onigleam.convert("\\x41\\x42\\x43")
r1.pattern // "ABC"
// Unicode escapes
let assert Ok(r2) = onigleam.convert("caf\\u00e9")
r2.pattern // "café"
TextMate Grammar Patterns
TextMate grammars sometimes reference capture groups that don’t exist in the current pattern (orphan backreferences). Use convert_textmate to handle these gracefully:
import onigleam
// This pattern references \1 but has no capture group
// Normal conversion would fail, but convert_textmate allows it
let assert Ok(result) = onigleam.convert_textmate(
"(['\"]).*?\\1" // Match quoted strings
)
// Returns Ok with a warning about the orphan backref
Flags and Options
import onigleam
import onigleam/options
// Case-insensitive matching
let assert Ok(result) = onigleam.convert_with_flags(
"(?<tag>html|body|div)",
"i"
)
result.regexp_options.case_insensitive // True
// Full control with options builder
let opts = options.default_options()
|> options.with_flags("i")
|> options.allow_orphan_backrefs
let assert Ok(result) = onigleam.to_regexp_details(
"(?<open><\\w+>).*?(?<close></\\w+>)",
opts,
)
API Reference
Main Functions
| Function | Description |
|---|---|
convert(pattern) | Convert with default options |
convert_with_flags(pattern, flags) | Convert with Oniguruma flags |
convert_textmate(pattern) | Convert with TextMate-friendly options |
to_regexp(pattern, options) | Convert and compile to Regexp |
to_regexp_details(pattern, options) | Convert with full result details |
format_error(error) | Format error as human-readable string |
ConversionResult
pub type ConversionResult {
ConversionResult(
pattern: String, // Generated pattern string
regexp_options: Options, // Options for gleam_regexp
capture_names: Dict(String, Int), // Name -> group number mapping
warnings: List(String), // Any warnings generated
)
}
Supported Features
| Feature | Status | Notes |
|---|---|---|
| Literals, escapes | Supported | Direct mapping |
Character classes [abc] | Supported | Including ranges, negation |
Quantifiers *, +, ?, {n,m} | Supported | Greedy and lazy |
Capturing groups (...) | Supported | Named groups converted to numbered |
Non-capturing groups (?:...) | Supported | Direct mapping |
Lookahead (?=...), (?!...) | Supported | Both positive and negative |
Lookbehind (?<=...), (?<!...) | Supported | Both positive and negative |
Anchors ^, $, \A, \z | Supported | Direct mapping |
Word boundaries \b, \B | Supported | Platform differences may apply |
Character shorthands \d, \w, \s | Supported | Direct mapping |
Alternation a|b | Supported | Direct mapping |
Unicode escapes \uHHHH | Supported | Converted to literal |
Hex escapes \xHH | Supported | Converted to literal |
Unsupported Features (Will Error)
| Feature | Why |
|---|---|
Atomic groups (?>...) | Cannot emulate in gleam_regexp |
Possessive quantifiers *+, ++ | Cannot emulate in gleam_regexp |
Recursion \g<0> | Not supported by underlying engines |
Subroutines \g<name> | Not supported by underlying engines |
Search start \G | Requires stateful regex |
Absence functions (?~...) | Cannot emulate |
Partial Support / Workarounds
| Feature | Handling |
|---|---|
| Named captures | Converted to numbered; mapping returned |
| dotAll mode | . replaced with [\s\S] when enabled |
Flag modifiers (?i:...) | Flags applied during transformation |
\K directive | Warning issued; full match returned |
Platform Compatibility
This library generates patterns compatible with both:
- Erlang’s
remodule (PCRE) - JavaScript’s
RegExp
Run tests on both targets:
gleam test --target erlang
gleam test --target javascript
Error Handling
import onigleam
case onigleam.convert("(?>atomic)") {
Ok(result) -> use_result(result)
Error(err) -> {
let message = onigleam.format_error(err)
// "Atomic groups are not supported. ..."
}
}
Development
gleam test
gleam test --target javascript # Test on JavaScript target
gleam test --target erlang # Test on Erlang target
Further documentation can be found at https://hexdocs.pm/onigleam.
License
MIT License. See LICENSE for details.