babble

A Markov chain text generator for Gleam.

gleam add babble

Usage

import gleam/io
import babble

pub fn main() {
  let model =
    babble.new(order: 2, tokenization: babble.Words)
    |> babble.train("the cat sat on the mat.")
    |> babble.train("the dog sat on the log.")

  let assert Ok(sentence) = babble.generate(model, babble.weighted, max_tokens: 200)
  io.println(sentence) // => the dog sat on the mat.
}

train is incremental, so you can keep one model and feed it text as it arrives. generate returns Error(EmptyModel) until the model has learned something.

Configuration

new takes two settings, fixed at construction:

order: how many previous tokens to condition on. Higher is more coherent but repeats the source more; lower is more random. 2 is a reasonable default.
tokenization: Words or Characters. With Characters, order counts characters.

The length cap is a generate argument (max_tokens:), not a model setting.

Sampling

generate takes a sampler: the function that chooses the next token from the weighted candidates at each step. Two are built in:

babble.weighted: picks at random, weighted by training frequency. Varies each call.
babble.most_likely: always picks the most frequent successor. Deterministic.

A sampler is fn(List(#(Step, Int))) -> Step, where Step is Continue(word) or Stop and the Int is the training count. Write your own for temperature, top-k, blocklists, and so on:

import gleam/int
import gleam/list

fn uniform(candidates: List(#(babble.Step, Int))) -> babble.Step {
  case list.drop(candidates, int.random(list.length(candidates))) {
    [#(step, _), ..] -> step
    [] -> babble.Stop
  }
}

Samplers are stateless, so use most_likely for reproducible output rather than seeding randomness yourself.

Generation

babble.generate(model, babble.weighted, max_tokens: 200) // one sentence
babble.generate_paragraph(model, 3, babble.weighted, max_tokens: 200) // three sentences
babble.generate_starting_with(model, "pizza", babble.weighted, max_tokens: 200) // from a prefix

A sentence ends at ., !, or ? (learned during training) or when it hits max_tokens.

Development

gleam test
gleam format