string_editor

A Gleam library for string manipulation and extraction. Extract substrings before, after, or between specific patterns.

Installation

gleam add string_editor

Usage

import string_editor

pub fn main() -> Nil {
  // Extract text before a pattern
  let assert Ok("hello") = string_editor.before("hello world", on: " ")
  
  // Extract text after a pattern  
  let assert Ok("world") = string_editor.after("hello world", on: " ")
  
  // Extract text between two patterns
  let assert Ok("content") = string_editor.between("<div>content</div>", from: "<div>", to: "</div>")
  
  // Count occurrences of a pattern
  let count = string_editor.count("hello hello world", of: "hello") // 2
  
  // Extract at specific index
  let assert Ok("a.b") = string_editor.before_at("a.b.c.d", on: ".", at: 1)
  
  // Extract all occurrences
  let all_before = string_editor.before_all("a.b.c.d", on: ".") // ["a", "a.b", "a.b.c"]
}

API Reference

`before(string: String, on pattern: String) -> Result(String, Nil)`

Returns the part of a string before the first occurrence of a given substring.

Examples:

string_editor.before("hello world", on: " ")
// Ok("hello")

string_editor.before("no-match", on: "!")  
// Error(Nil)

`after(string: String, on pattern: String) -> Result(String, Nil)`

Returns the part of a string after the first occurrence of a given substring.

Examples:

string_editor.after("hello world", on: " ")
// Ok("world")

string_editor.after("no-match", on: "!")
// Error(Nil)

`between(string: String, from start: String, to end: String) -> Result(String, Nil)`

Returns the part of a string between two given substrings. Finds the first occurrence of start and then the first occurrence of end after start.

Examples:

string_editor.between("<a>link</a>", from: "<a>", to: "</a>")
// Ok("link")

string_editor.between("<h1>title</h1>", from: "<h1>", to: "</h2>")
// Error(Nil)

`count(string: String, of pattern: String) -> Int`

Counts the number of occurrences of a substring in a string.

Examples:

string_editor.count("hello hello world", of: "hello")
// 2

string_editor.count("gleam is fun", of: "rust")
// 0

string_editor.count("aaaa", of: "aa")
// 2 (non-overlapping matches)

`before_at(string: String, on pattern: String, at index: Int) -> Result(String, Nil)`

Returns the part of a string before the nth occurrence of a given substring (0-indexed).

Examples:

string_editor.before_at("a.b.c.d", on: ".", at: 1)
// Ok("a.b")

string_editor.before_at("hello world", on: " ", at: 5)
// Error(Nil)

`after_at(string: String, on pattern: String, at index: Int) -> Result(String, Nil)`

Returns the part of a string after the nth occurrence of a given substring (0-indexed).

Examples:

string_editor.after_at("a.b.c.d", on: ".", at: 1)
// Ok("c.d")

string_editor.after_at("hello world", on: " ", at: 5)
// Error(Nil)

`between_at(string: String, from start: String, to end: String, at index: Int) -> Result(String, Nil)`

Returns the part of a string between the nth occurrence of start and the first occurrence of end after that (0-indexed for start pattern).

Examples:

string_editor.between_at("<a>1</a><a>2</a>", from: "<a>", to: "</a>", at: 1)
// Ok("2")

string_editor.between_at("<h1>title</h1>", from: "<h1>", to: "</h2>", at: 0)
// Error(Nil)

`before_all(string: String, on pattern: String) -> List(String)`

Returns all parts of a string before each occurrence of a given substring.

Examples:

string_editor.before_all("a.b.c.d", on: ".")
// ["a", "a.b", "a.b.c"]

string_editor.before_all("hello world", on: "!")
// []

`after_all(string: String, on pattern: String) -> List(String)`

Returns all parts of a string after each occurrence of a given substring.

Examples:

string_editor.after_all("a.b.c.d", on: ".")
// ["b.c.d", "c.d", "d"]

string_editor.after_all("hello world", on: "!")
// []

`between_all(string: String, from start: String, to end: String) -> List(String)`

Returns all parts of a string between each occurrence of start and the next occurrence of end.

Examples:

string_editor.between_all("<a>1</a><b>2</b><a>3</a>", from: "<a>", to: "</a>")
// ["1", "3"]

string_editor.between_all("no matches here", from: "<div>", to: "</div>")
// []

Common Use Cases

HTML/XML Parsing

// Extract content from HTML tags
string_editor.between("<title>My Page</title>", from: "<title>", to: "</title>")
// Ok("My Page")

// Extract all link texts from HTML
string_editor.between_all("<a>Home</a> <a>About</a> <a>Contact</a>", from: "<a>", to: "</a>")
// ["Home", "About", "Contact"]

// Count div tags in HTML
string_editor.count("<div>content</div><div>more</div>", of: "<div>")
// 2

File Path Manipulation

// Get filename from path
string_editor.after("/home/user/document.txt", on: "/")
// Ok("document.txt")

// Get file extension
string_editor.after("document.txt", on: ".")
// Ok("txt")

// Get all directory components
string_editor.after_all("/home/user/projects/myapp", on: "/")
// ["home/user/projects/myapp", "user/projects/myapp", "projects/myapp", "myapp"]

// Count directory levels
string_editor.count("/home/user/projects/myapp", of: "/")
// 4

URL Parsing

// Extract domain from URL
string_editor.between("https://example.com/path", from: "://", to: "/")
// Ok("example.com")

Configuration Parsing

// Extract values from key=value pairs
string_editor.after("DATABASE_URL=postgres://localhost", on: "=")
// Ok("postgres://localhost")

// Parse all environment variables from a string
string_editor.after_all("PORT=3000\nDB_HOST=localhost\nDB_PORT=5432", on: "=")
// ["3000\nDB_HOST=localhost\nDB_PORT=5432", "localhost\nDB_PORT=5432", "5432"]

// Count configuration entries
string_editor.count("key1=value1,key2=value2,key3=value3", of: "=")
// 3

Log Processing

// Extract all timestamps from logs
string_editor.before_all("2023-01-01 INFO: message\n2023-01-02 ERROR: problem", on: " INFO:")
// Would extract timestamp parts before INFO entries

// Count error occurrences
string_editor.count("INFO: ok\nERROR: fail\nINFO: ok\nERROR: fail", of: "ERROR:")
// 2

Error Handling

Functions have different return types based on their purpose:

Result Functions

Functions that return Result(String, Nil) return Error(Nil) when:

The pattern is not found in the string (before, after, between)
The pattern doesn’t occur enough times (before_at, after_at, between_at)
For between functions, when either the start or end pattern is not found in the correct order

Count Function

count() always returns an Int (never fails), returning 0 when no matches are found.

List Functions

*_all functions always return a List(String) (never fail), returning an empty list [] when no matches are found.

Performance Analysis

Here’s an analysis of the performance characteristics of each function:

`before()` and `after()` Functions

Time Complexity: O(n) where n is the length of the input string

Uses string.split_once() which performs a single pass through the string
Stops at the first occurrence of the pattern
Minimal string allocations for the result

Space Complexity: O(k) where k is the length of the result substring

Returns only the required portion of the string
Minimal intermediate allocations
Memory usage primarily scales with output size

Performance Characteristics:

Best case: Pattern found early in string - O(p) where p is position of pattern
Worst case: Pattern not found - O(n) full string scan
Memory usage: Utilizes Gleam’s standard string operations

`between()` Function

Time Complexity: O(n) where n is the length of the input string

Makes two sequential calls to the underlying split operations
First finds the start pattern, then searches the remainder for the end pattern
Still linear overall as each character is examined at most twice

Space Complexity: O(k) where k is the length of the extracted content

Creates one intermediate string (the portion after the start pattern)
Final result is a substring of that intermediate string
Memory usage remains proportional to output, not total input

Performance Characteristics:

Best case: Both patterns found early - O(p₁ + p₂) where p₁, p₂ are pattern positions
Worst case: End pattern not found - O(n) where n is length after start pattern
Implementation: Built on top of the after() and before() functions

`count()` Function

Time Complexity: O(n) where n is the length of the input string

Uses string.split() which performs a single pass through the string
Counts splits by getting list length and subtracting 1
Handles edge cases (empty patterns) in constant time

Space Complexity: O(m) where m is the number of splits

Creates a list of string parts during splitting
Memory scales with both the number of pattern occurrences and the size of the split parts
No regex compilation overhead for simple pattern matching

Performance Characteristics:

Best case: Pattern not found - O(n) scan with minimal memory
Worst case: Many small patterns - O(n) time but higher memory for split results
Counting approach: Gets list length rather than iterating through results

Indexed Functions (`*_at`)

Time Complexity: O(n) where n is the length of the input string

All use string.split() for initial parsing - single pass through string
List operations (take, drop, join) are O(m) where m is number of splits
Overall complexity remains O(n) as splits are bounded by string length

Space Complexity: O(m) where m is the number of parts after splitting

Creates list of all split parts, even if only using subset
Result size is O(k) where k is length of extracted content
Uses more memory than basic functions when there are many pattern matches

Performance Characteristics:

Best case: Low index with early patterns - O(n) time, minimal extra memory
Worst case: High index with many splits - O(n) time, O(m) space for all parts
Index validation: Bounds checking happens before processing

Multi-Instance Functions (`*_all`)

Time Complexity: O(n + m²) where n is string length, m is number of splits

Initial split operation: O(n)
For each result position (m-1 results), rebuilds string from parts: O(m)
Overall: O(n + m²) where m is typically much smaller than n

Space Complexity: O(m × k) where m is matches, k is average result length

Stores all results in a list
Each result requires reconstructing string from parts
Memory scales with both number of matches and their sizes

Performance Characteristics:

Best case: Few patterns, short results - approaches O(n)
Worst case: Many patterns creating large results - O(n + m²) time, O(m × k) space
Batch processing: Single split operation shared across all results

`between_all()` Function

Time Complexity: O(n + m² + r) where n is input length, m is start matches, r is total results

Leverages after_all() for start pattern extraction: O(n + m²)
Filters each result through before(): O(r) where r ≤ m
Combined complexity: O(n + m² + r)

Space Complexity: O(m × k + r × j) where k is average after_all result size, j is final result size

Intermediate storage for all after_all results
Final filtered results list
Memory peaks during intermediate step, then reduces after filtering

Performance Characteristics:

Best case: Few start patterns, most have matching end patterns - O(n + m²)
Worst case: Many start patterns, few matching end patterns - O(n + m²) time, with higher intermediate memory usage
Filtering approach: Built-in filtering reduces final memory footprint

Real-World Performance Implications

Suitable for simple use cases involving:

Log parsing: Extract basic timestamps, error codes, or specific fields from log entries (count for error frequency, *_all for batch extraction)
Configuration files: Parse simple key-value pairs or extract section content (after_all for all values, count for validation)
HTML/XML processing: Extract content from known, simple tag structures (between_all for multiple tags, *_at for specific positions)
URL manipulation: Extract basic domains, paths, or query parameters (count for segment counting, before_at/after_at for path navigation)
CSV/TSV processing: Navigate simple columnar data (*_at for specific columns, count for field validation)
Template processing: Extract and count basic placeholders (between_all for all variables, count for validation)

Scaling characteristics:

Large files: Basic functions (before, after, between) scale linearly
Multiple extractions: *_all functions have O(m²) component but m is typically small
Memory constrained environments: Use basic functions when possible; *_all functions require more memory
Batch processing: *_all functions more efficient than repeated individual calls

Function Selection Guidelines:

Single extraction: Use before, after, between for best performance
Specific position: Use *_at functions when you know the index
Multiple results: Use *_all functions for batch extraction
Counting only: Use count - most memory efficient for frequency analysis
Large strings with many patterns: Consider memory usage of *_all functions

Comparison with alternatives:

vs. Regular expressions: May be faster for simple pattern matching due to no regex compilation step
vs. Manual string iteration: Comparable performance with built-in error handling and cleaner syntax
vs. Split-based approaches: Basic functions may be more efficient (stop at first match); *_all functions use full split but avoid repeated parsing
vs. Multiple individual calls: *_all functions may be more efficient than repeated calls for batch extractions

Optimization Tips

Pattern placement: Consider placing the most unique part of your pattern first (may improve performance in some cases)
Function selection:
- Use count instead of length(before_all(...)) for counting
- Use *_at when you know the specific index needed
- Use *_all for batch operations instead of multiple individual calls
For between() operations: More unique start patterns improve performance
Memory considerations:
- Basic functions have lower memory overhead
- *_all functions create intermediate lists - consider this for large datasets
- count uses less memory when you only need frequency information
Pattern considerations: Shorter, more specific patterns can reduce false matches

Development

gleam test  # Run the tests
gleam format # Format the code

Releasing a New Version

When releasing a new version of string_editor, follow these steps:

1. Update Version Numbers

Update the version in the following files:

gleam.toml - Update the version field (e.g., from "1.0.1" to "1.0.2")

2. Update CHANGELOG.md

Add a new section to the top of CHANGELOG.md following this format:

## [x.y.z] - YYYY-MM-DD

### Added
- New features

### Changed
- Changes to existing functionality

### Fixed
- Bug fixes

### Removed
- Removed features (if any)

3. Pre-release Checks

Run these commands to ensure everything is working correctly:

gleam format      # Format code consistently
gleam check       # Type check all modules
gleam test        # Run all tests
gleam docs build  # Verify documentation generates correctly

4. Commit and Tag

# Stage your changes
git add gleam.toml CHANGELOG.md

# Commit with a descriptive message
git commit -m "Update to vX.Y.Z: Brief description of changes"

# Create an annotated tag
git tag -a vX.Y.Z -m "Release vX.Y.Z

- Brief summary of major changes
- Another change if needed"

# Push commits and tag to remote
git push origin main vX.Y.Z

5. Publish to Hex

Once all checks pass and the tag is pushed:

gleam publish

This will publish the new version to hex.pm.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request!

Documentation

Further documentation can be found at https://hexdocs.pm/string_editor.

string_editor

Installation

Usage

API Reference

before(string: String, on pattern: String) -> Result(String, Nil)

after(string: String, on pattern: String) -> Result(String, Nil)

between(string: String, from start: String, to end: String) -> Result(String, Nil)

count(string: String, of pattern: String) -> Int

before_at(string: String, on pattern: String, at index: Int) -> Result(String, Nil)

after_at(string: String, on pattern: String, at index: Int) -> Result(String, Nil)

between_at(string: String, from start: String, to end: String, at index: Int) -> Result(String, Nil)

before_all(string: String, on pattern: String) -> List(String)

after_all(string: String, on pattern: String) -> List(String)

between_all(string: String, from start: String, to end: String) -> List(String)

Common Use Cases

HTML/XML Parsing

File Path Manipulation

URL Parsing

Configuration Parsing

Log Processing

Error Handling

Result Functions

Count Function

List Functions

Performance Analysis

before() and after() Functions

between() Function

count() Function

Indexed Functions (*_at)

Multi-Instance Functions (*_all)

between_all() Function

Real-World Performance Implications

Optimization Tips

Development

Releasing a New Version

1. Update Version Numbers

2. Update CHANGELOG.md

3. Pre-release Checks

4. Commit and Tag

5. Publish to Hex

Contributing

Documentation

`before(string: String, on pattern: String) -> Result(String, Nil)`

`after(string: String, on pattern: String) -> Result(String, Nil)`

`between(string: String, from start: String, to end: String) -> Result(String, Nil)`

`count(string: String, of pattern: String) -> Int`

`before_at(string: String, on pattern: String, at index: Int) -> Result(String, Nil)`

`after_at(string: String, on pattern: String, at index: Int) -> Result(String, Nil)`

`between_at(string: String, from start: String, to end: String, at index: Int) -> Result(String, Nil)`

`before_all(string: String, on pattern: String) -> List(String)`

`after_all(string: String, on pattern: String) -> List(String)`

`between_all(string: String, from start: String, to end: String) -> List(String)`

`before()` and `after()` Functions

`between()` Function

`count()` Function

Indexed Functions (`*_at`)

Multi-Instance Functions (`*_all`)

`between_all()` Function