A Markov chain text generator for Gleam.
gleam add babbleimport gleam/io
import babble
pub fn main() {
let model =
babble.new(order: 2, tokenization: babble.Words)
|> babble.train("the cat sat on the mat.")
|> babble.train("the dog sat on the log.")
let assert Ok(sentence) = babble.generate(model, babble.weighted, max_tokens: 200)
io.println(sentence) // => the dog sat on the mat.
}train is incremental, so you can keep one model and feed it text as it arrives.
generate returns Error(EmptyModel) until the model has learned something.
new takes two settings, fixed at construction:
order: how many previous tokens to condition on. Higher is more coherent but repeats the source more; lower is more random. 2 is a reasonable default.tokenization:WordsorCharacters. WithCharacters,ordercounts characters.
The length cap is a generate argument (max_tokens:), not a model setting.
generate takes a sampler: the function that chooses the next token from the
weighted candidates at each step. Two are built in:
babble.weighted: picks at random, weighted by training frequency. Varies each call.babble.most_likely: always picks the most frequent successor. Deterministic.
A sampler is fn(List(#(Step, Int))) -> Step, where Step is Continue(word) or
Stop and the Int is the training count. Write your own for temperature, top-k,
blocklists, and so on:
import gleam/int
import gleam/list
fn uniform(candidates: List(#(babble.Step, Int))) -> babble.Step {
case list.drop(candidates, int.random(list.length(candidates))) {
[#(step, _), ..] -> step
[] -> babble.Stop
}
}Samplers are stateless, so use most_likely for reproducible output rather than
seeding randomness yourself.
babble.generate(model, babble.weighted, max_tokens: 200) // one sentence
babble.generate_paragraph(model, 3, babble.weighted, max_tokens: 200) // three sentences
babble.generate_starting_with(model, "pizza", babble.weighted, max_tokens: 200) // from a prefixA sentence ends at ., !, or ? (learned during training) or when it hits
max_tokens.
gleam test
gleam format