A Linux/Wayland-first voice dictation daemon written in pure Go.
dicta is two things:
- Type-mode — press Pause, talk, and the daemon types the transcribed text into whatever window has focus, committing each utterance on VAD silence. Press Pause again to stop.
- Clip-mode — press Scroll Lock, talk, and a small editable panel appears with the cleaned transcript. Press Enter to copy the buffer to the clipboard, Shift+Enter to insert a newline, Esc to cancel.
There is no PTT, no wakeword, no always-on listening. Capture starts when you press a key and stops when the session ends.
Pre-1.0. The full v1 build (phases 1–13 of the design) is functional; this is the docs phase. Use it, but expect rough edges and please file issues.
Speech-to-text is one of the few accessibility tools where Linux still has gaps. Existing options either depend on commercial cloud APIs, require Python toolchains and GPU model files, or assume X11. dicta is a single static Go binary that:
- Runs anywhere Wayland and PipeWire run.
- Talks to any Wyoming-protocol ASR server (faster-whisper et al.) by default — no model download in v1.
- Optionally talks to a local
whisper-server(subprocess-managed), or any OpenAI-compatible transcription endpoint. - Optionally cleans transcripts with any OpenAI-compatible LLM (llama.cpp's server, vLLM, OpenAI itself).
┌──────────────┐ ┌──────────────┐ ┌────────────────┐ ┌──────────────┐
│ Pause / │ → │ dictad │ → │ asrclient │ → │ Wyoming / │
│ Scroll Lock │ │ │ │ (Go module) │ │ whispercpp/ │
│ (compositor) │ │ audio + VAD │ ← │ │ ← │ OpenAI │
└──────────────┘ │ state mach. │ └────────────────┘ └──────────────┘
│ control sock│
│ │ ┌──────────────┐
│ │ → │ ydotool │ (type-mode)
│ │ └──────────────┘
│ │ ┌──────────────┐ ┌──────────────┐
│ │ ↔ │ dicta-preview│ → │ wl-copy │ (clip-mode)
│ │ │ (Gio UI) │ └──────────────┘
└──────────────┘ └──────────────┘
dictad is the daemon (long-lived). dicta is a thin CLI that talks
to the daemon over a Unix socket. dicta-preview is the clip-mode
panel, spawned on demand. ydotoold and the ASR backend are external.
# Ubuntu / Debian
sudo ./scripts/install-deps-ubuntu.sh
# Fedora
sudo ./scripts/install-deps-fedora.sh
# Arch
sudo ./scripts/install-deps-arch.shThese install: Go 1.25+, the Gio system libraries (Wayland, xkbcommon, GLES, EGL, libvulkan, libXcursor) for the preview panel, ydotool, and wl-clipboard.
task build:allProduces bin/dictad, bin/dicta, and bin/dicta-preview.
task install:userInstalls to ~/.local/bin and drops the systemd user unit into
~/.config/systemd/user/.
The default backend is Wyoming. You can run any Wyoming-compatible
service — most users want
wyoming-faster-whisper.
A common setup is its Docker image listening on tcp://localhost:10300.
Other backends:
--asr-backend whispercpp— dicta supervises a localwhisper-serversubprocess. Requires you to installwhisper.cpp/whisper-serverand a model.--asr-backend openai— point at any OpenAI-compatible/v1/audio/transcriptionsendpoint. Requires an API key.
See CONFIGURATION.md for every flag.
systemctl --user edit dictad.service[Service]
ExecStart=
ExecStart=%h/.local/bin/dictad \
--asr-backend wyoming \
--asr-wyoming-addr tcp://localhost:10300 \
--preview-binary %h/.local/bin/dicta-previewsystemctl --user enable --now dictad.service
journalctl --user -u dictad.service -f| Key | What it does | Command |
|---|---|---|
| Pause | Toggle type-mode session | dicta toggle_talk --mode type |
| Scroll Lock | Toggle clip-mode panel | dicta toggle_talk --mode clip |
For GNOME, bind these via gsettings (the Settings GUI tries to nudge
you toward chord shortcuts; bypassing it lets you use unmodified
single keys). For Sway/Hyprland/KDE, bind in the compositor config.
Type-mode drives ydotool, which talks to a long-running ydotoold
user daemon. Out of the box, ydotoold leaks accept'd client sockets
and wedges in roughly a week of normal use — typing silently stops
working (audio still captures, transcripts still land in the audit log
if enabled). Tracked upstream; the workaround is two example unit
files plus a daily restart timer.
See packaging/systemd/README.md.
A one-time systemctl --user restart ydotoold.service unsticks an
already-wedged daemon; the timer prevents recurrence.
Off by default. To enable in clip-mode (the preview panel will display cleaned text the user can still edit before pressing Enter):
ExecStart=%h/.local/bin/dictad \
... \
--cleanup-enabled \
--cleanup-endpoint http://my-llama-server.lan:8080/v1 \
--cleanup-model qwen3-7b-instructThe mechanical system prompt is a code constant (cannot be templated by user input). Cleanup is only invoked in clip-mode; type-mode always sends the raw transcript to ydotool.
Off by default. JSONL transcripts (and optionally WAV captures) under
$XDG_DATA_HOME/dicta/YYYY-MM-DD/:
ExecStart=%h/.local/bin/dictad \
... \
--audit-enabled \
--audit-keep-audio \
--audit-retention-days 7Both --audit-enabled and --audit-keep-audio are required to capture
audio. Both default off because both are sensitive by definition.
v1 ships exactly two compositor bindings (D17 in the design doc): Pause for type-mode, Scroll Lock for clip-mode. There is no global commit or cancel hotkey — clip-mode commits via panel-local Enter and type-mode commits per-utterance via VAD silence. PTT (push-to-talk) and wakeword are out of scope for v1 and are tracked in §14 of the design doc.
- dicta-design.md — the design spec (v0.2). Read this before opening a non-trivial PR.
- CONFIGURATION.md — every flag.
- SECURITY.md — security model and the code paths that enforce it.
- packaging/systemd/README.md — systemd unit install and override patterns.
# Daemon + CLI (pure Go, static)
CGO_ENABLED=0 go build -o bin/dictad ./cmd/dictad
CGO_ENABLED=0 go build -o bin/dicta ./cmd/dicta
# Preview panel (CGo, Wayland)
go build -tags nox11 -o bin/dicta-preview ./cmd/dicta-previewThe daemon and CLI MUST build with CGO_ENABLED=0 (D13). The
MemoryDenyWriteExecute=true flag in the systemd unit relies on this.
task test # unit tests
task test:race # with race detector + goleak
task vet # go vet
task check # all of the aboveinternal/control ships a fuzz target for the wire-protocol parser:
go test -fuzz=FuzzCommandUnmarshal -fuzztime=1m ./internal/controlThe design doc's §13 lists the open decision points; everything else is locked. If you want to change a locked decision, file an issue explaining why before writing code — these were deliberate.
Bugs, typos, packaging contributions: PRs welcome.
Apache-2.0 — see LICENSE.