babble

A Markov chain text generator for Gleam.

gleam add babble

Usage

import gleam/io
import babble

pub fn main() {
  let model =
    babble.new(order: 2, tokenization: babble.Words)
    |> babble.train("the cat sat on the mat.")
    |> babble.train("the dog sat on the log.")

  let assert Ok(sentence) = babble.generate(model, babble.weighted, max_tokens: 200)
  io.println(sentence) // => the dog sat on the mat.
}

train is incremental, so you can keep one model and feed it text as it arrives. generate returns Error(EmptyModel) until the model has learned something.

Configuration

new takes two settings, fixed at construction:

order: how many previous tokens to condition on. Higher is more coherent but repeats the source more; lower is more random. 2 is a reasonable default.
tokenization: Words or Characters. With Characters, order counts characters.

The length cap is a generate argument (max_tokens:), not a model setting.

Sampling

generate takes a sampler: the function that chooses the next token from the weighted candidates at each step. Two are built in:

babble.weighted: picks at random, weighted by training frequency. Varies each call.
babble.most_likely: always picks the most frequent successor. Deterministic.

A sampler is fn(List(#(Step, Int))) -> Step, where Step is Continue(word) or Stop and the Int is the training count. Write your own for temperature, top-k, blocklists, and so on:

import gleam/int
import gleam/list

fn uniform(candidates: List(#(babble.Step, Int))) -> babble.Step {
  case list.drop(candidates, int.random(list.length(candidates))) {
    [#(step, _), ..] -> step
    [] -> babble.Stop
  }
}

Samplers are stateless, so use most_likely for reproducible output rather than seeding randomness yourself.

Generation

babble.generate(model, babble.weighted, max_tokens: 200) // one sentence
babble.generate_paragraph(model, 3, babble.weighted, max_tokens: 200) // three sentences
babble.generate_starting_with(model, "pizza", babble.weighted, max_tokens: 200) // from a prefix

A sentence ends at ., !, or ? (learned during training) or when it hits max_tokens.

Development

gleam test
gleam format

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gleam.toml		gleam.toml
manifest.toml		manifest.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

babble

Usage

Configuration

Sampling

Generation

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

babble

Usage

Configuration

Sampling

Generation

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages