babble
A Markov chain text generator for Gleam.
gleam add babble
Usage
import gleam/io
import babble
pub fn main() {
let model =
babble.new(order: 2, tokenization: babble.Words)
|> babble.train("the cat sat on the mat.")
|> babble.train("the dog sat on the log.")
let assert Ok(sentence) = babble.generate(model, babble.weighted, max_tokens: 200)
io.println(sentence) // => the dog sat on the mat.
}
train is incremental, so you can keep one model and feed it text as it arrives.
generate returns Error(EmptyModel) until the model has learned something.
Configuration
new takes two settings, fixed at construction:
order: how many previous tokens to condition on. Higher is more coherent but repeats the source more; lower is more random. 2 is a reasonable default.tokenization:WordsorCharacters. WithCharacters,ordercounts characters.
The length cap is a generate argument (max_tokens:), not a model setting.
Sampling
generate takes a sampler: the function that chooses the next token from the
weighted candidates at each step. Two are built in:
babble.weighted: picks at random, weighted by training frequency. Varies each call.babble.most_likely: always picks the most frequent successor. Deterministic.
A sampler is fn(List(#(Step, Int))) -> Step, where Step is Continue(word) or
Stop and the Int is the training count. Write your own for temperature, top-k,
blocklists, and so on:
import gleam/int
import gleam/list
fn uniform(candidates: List(#(babble.Step, Int))) -> babble.Step {
case list.drop(candidates, int.random(list.length(candidates))) {
[#(step, _), ..] -> step
[] -> babble.Stop
}
}
Samplers are stateless, so use most_likely for reproducible output rather than
seeding randomness yourself.
Generation
babble.generate(model, babble.weighted, max_tokens: 200) // one sentence
babble.generate_paragraph(model, 3, babble.weighted, max_tokens: 200) // three sentences
babble.generate_starting_with(model, "pizza", babble.weighted, max_tokens: 200) // from a prefix
A sentence ends at ., !, or ? (learned during training) or when it hits
max_tokens.
Development
gleam test
gleam format