Add Pipecat Evals framework documentation by aconchillo · Pull Request #889 · pipecat-ai/docs

aconchillo · 2026-06-11T02:53:20Z

Summary

Documents the new built-in Pipecat Evals framework with a top-level Evals group in the Pipecat tab:

Overview (pipecat/evals/overview): why evals matter, the eval transport + harness + judge architecture, text vs audio modes, and requirements.
Quickstart (pipecat/evals/quickstart): start an existing agent with -t eval, write a first scenario, and run pipecat eval run.
Writing Scenarios (pipecat/evals/scenarios): the full YAML reference: turns, expectations (eval, text_contains, within_ms, function calls), interruptions with send_after, audio mode (user: / judge: blocks), !include, reset:, and vision turns.
Eval Suites (pipecat/evals/suites): manifests, pipecat eval suite, run output layout, and CI usage.
Using the Library (pipecat/evals/library): the pipecat.evals Python API (EvalScenario, EvalSession, EvalResult, EvalManifest, EvalSuite), building scenarios in code, and injecting a custom judge/speech/transcriber.
Agent Self-Improvement (pipecat/evals/agent-self-improvement): closing the loop with AI coding assistants (edit, run evals, read results, iterate).

Also:

Adds a CLI reference page for pipecat eval run / pipecat eval suite (api-reference/cli/eval).
Updates the Fundamentals evaluations overview to feature the built-in framework (replacing the old TranscriptionFrame local-testing workaround) while keeping the third-party platform content.
Removes the pipecat tail CLI reference page (the command is going away): nav entry dropped, the old /cli/tail redirect repointed to the CLI overview, and the CLI overview now references pipecat eval instead.

All content verified against the framework source in pipecat-ai/pipecat (src/pipecat/evals/, runner integration, and the release-evals scenarios). mint broken-links passes.

Document the new built-in evals framework with a top-level Evals group in the Pipecat tab: - Overview: why evals matter, eval transport + harness + judge architecture, text vs audio modes, requirements - Quickstart: run an existing agent with -t eval and a first scenario - Writing Scenarios: full YAML reference (turns, expectations, judge, audio mode, includes, interruptions, vision) - Eval Suites: manifests, pipecat eval suite, run output, CI - Using the Library: EvalScenario/EvalSession/EvalSuite Python API - Agent Self-Improvement: closing the loop with AI coding assistants Also add a CLI reference page for pipecat eval run/suite, and update the Fundamentals evaluations overview to feature the built-in framework.

mintlify · 2026-06-11T02:53:34Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
Pipecat	🟢 Ready	View Preview	Jun 11, 2026, 2:54 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

markbackman

LGTM! Just a few small changes. Nice docs 👏

markbackman · 2026-06-11T04:04:47Z

+    A scenario is a YAML file describing a scripted conversation and the behavior you expect. Save this as `scenarios/capital_question.yaml`:
+
+    <Tabs>
+      <Tab title="Ollama judge (default)">


When using the default, is ollama, gemma2:9b the default? Do you not need to specify it?

Yes ollama with gemma2:9b is the default. I'll add a comment

…work)

Drop the tail page and its nav entry, repoint the old /cli/tail redirect to the CLI overview, and replace the overview's tail references with the eval command (which was missing from that page).

…y default

…ith text and audio

Move the third-party platform pages (Bluejay, Cekura, Coval) under a Third-party Platforms subgroup in the Evals tab group, absorb the old evaluations overview's production-evaluation content into the Evals overview, and add redirects for the old URLs.

aconchillo requested a review from markbackman June 11, 2026 02:53

mintlify Bot deployed to staging June 11, 2026 02:54 View deployment

markbackman approved these changes Jun 11, 2026

View reviewed changes

markbackman reviewed Jun 11, 2026

View reviewed changes

Comment thread docs.json Outdated

Clarify that within_ms deadlines consume the turn's shared budget

480f0f0

mintlify Bot deployed to staging June 11, 2026 04:45 View deployment

Add a factory: subsection with a code example for custom eval services

6f891d5

mintlify Bot deployed to staging June 11, 2026 04:52 View deployment

Remove fixtures: documentation (field is being removed from the frame…

bf036b1

…work)

mintlify Bot deployed to staging June 11, 2026 04:54 View deployment

Use uv commands for installs and runs; order eval after init in CLI nav

d089208

mintlify Bot deployed to staging June 11, 2026 15:50 View deployment

Remove the pipecat tail CLI reference page

b2e9f61

Drop the tail page and its nav entry, repoint the old /cli/tail redirect to the CLI overview, and replace the overview's tail references with the eval command (which was missing from that page).

mintlify Bot deployed to staging June 11, 2026 16:01 View deployment

Point Agent Self-Improvement next steps forward to production evaluation

78ef52c

mintlify Bot deployed to staging June 11, 2026 16:42 View deployment

Note the Ollama gemma2:9b judge default in the quickstart scenario step

57b273c

mintlify Bot deployed to staging June 11, 2026 16:48 View deployment

Make text mode explicit in Writing Scenarios and document the modalit…

d63c217

…y default

mintlify Bot deployed to staging June 11, 2026 16:59 View deployment

Clarify the text/audio modes section: block vs per-turn user, defaults

1881baa

mintlify Bot deployed to staging June 11, 2026 17:05 View deployment

aconchillo added 2 commits June 11, 2026 10:06

Move the judge default into a note in the text mode section

c5beb50

Move built-in speech and transcription services into notes

b34a45b

mintlify Bot deployed to staging June 11, 2026 17:07 View deployment

mintlify Bot deployed to staging June 11, 2026 17:08 View deployment

aconchillo added 2 commits June 11, 2026 10:09

Move the local/HTTP service constraint note into the factory section

2c460c1

Move vision turns out of top-level fields; image: is a per-turn field

1178ad1

mintlify Bot deployed to staging June 11, 2026 17:10 View deployment

mintlify Bot deployed to staging June 11, 2026 17:11 View deployment

Restructure modes section block-first: user input and judging, each w…

f6a2a4d

…ith text and audio

mintlify Bot deployed to staging June 11, 2026 17:17 View deployment

Show the text-mode equivalents as YAML blocks

0ac397c

mintlify Bot deployed to staging June 11, 2026 17:18 View deployment

Moonshine, not whisper, is the default transcriber

0b5342a

mintlify Bot deployed to staging June 11, 2026 17:20 View deployment

mintlify Bot deployed to staging June 11, 2026 21:27 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pipecat Evals framework documentation#889

Add Pipecat Evals framework documentation#889
aconchillo wants to merge 18 commits into
mainfrom
aleix/evals-docs

aconchillo commented Jun 11, 2026 •

edited

Loading

Uh oh!

mintlify Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

markbackman left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markbackman Jun 11, 2026

Uh oh!

aconchillo Jun 11, 2026 •

edited

Loading

Uh oh!

aconchillo Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aconchillo commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

mintlify Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markbackman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markbackman Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

aconchillo Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aconchillo Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aconchillo commented Jun 11, 2026 •

edited

Loading

mintlify Bot commented Jun 11, 2026 •

edited

Loading

aconchillo Jun 11, 2026 •

edited

Loading