Skip to content

Add Pipecat Evals framework documentation#889

Open
aconchillo wants to merge 18 commits into
mainfrom
aleix/evals-docs
Open

Add Pipecat Evals framework documentation#889
aconchillo wants to merge 18 commits into
mainfrom
aleix/evals-docs

Conversation

@aconchillo

@aconchillo aconchillo commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Documents the new built-in Pipecat Evals framework with a top-level Evals group in the Pipecat tab:

  • Overview (pipecat/evals/overview): why evals matter, the eval transport + harness + judge architecture, text vs audio modes, and requirements.
  • Quickstart (pipecat/evals/quickstart): start an existing agent with -t eval, write a first scenario, and run pipecat eval run.
  • Writing Scenarios (pipecat/evals/scenarios): the full YAML reference: turns, expectations (eval, text_contains, within_ms, function calls), interruptions with send_after, audio mode (user: / judge: blocks), !include, reset:, and vision turns.
  • Eval Suites (pipecat/evals/suites): manifests, pipecat eval suite, run output layout, and CI usage.
  • Using the Library (pipecat/evals/library): the pipecat.evals Python API (EvalScenario, EvalSession, EvalResult, EvalManifest, EvalSuite), building scenarios in code, and injecting a custom judge/speech/transcriber.
  • Agent Self-Improvement (pipecat/evals/agent-self-improvement): closing the loop with AI coding assistants (edit, run evals, read results, iterate).

Also:

  • Adds a CLI reference page for pipecat eval run / pipecat eval suite (api-reference/cli/eval).
  • Updates the Fundamentals evaluations overview to feature the built-in framework (replacing the old TranscriptionFrame local-testing workaround) while keeping the third-party platform content.
  • Removes the pipecat tail CLI reference page (the command is going away): nav entry dropped, the old /cli/tail redirect repointed to the CLI overview, and the CLI overview now references pipecat eval instead.

All content verified against the framework source in pipecat-ai/pipecat (src/pipecat/evals/, runner integration, and the release-evals scenarios). mint broken-links passes.

Document the new built-in evals framework with a top-level Evals group
in the Pipecat tab:

- Overview: why evals matter, eval transport + harness + judge
  architecture, text vs audio modes, requirements
- Quickstart: run an existing agent with -t eval and a first scenario
- Writing Scenarios: full YAML reference (turns, expectations, judge,
  audio mode, includes, interruptions, vision)
- Eval Suites: manifests, pipecat eval suite, run output, CI
- Using the Library: EvalScenario/EvalSession/EvalSuite Python API
- Agent Self-Improvement: closing the loop with AI coding assistants

Also add a CLI reference page for pipecat eval run/suite, and update
the Fundamentals evaluations overview to feature the built-in framework.
@mintlify

mintlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
Pipecat 🟢 Ready View Preview Jun 11, 2026, 2:54 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@markbackman markbackman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few small changes. Nice docs 👏

Comment thread pipecat/evals/overview.mdx Outdated
Comment thread pipecat/evals/overview.mdx Outdated
Comment thread pipecat/evals/quickstart.mdx Outdated
Comment thread pipecat/evals/quickstart.mdx Outdated
A scenario is a YAML file describing a scripted conversation and the behavior you expect. Save this as `scenarios/capital_question.yaml`:

<Tabs>
<Tab title="Ollama judge (default)">

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the default, is ollama, gemma2:9b the default? Do you not need to specify it?

@aconchillo aconchillo Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ollama with gemma2:9b is the default. I'll add a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this

image

Comment thread pipecat/evals/scenarios.mdx Outdated
Comment thread pipecat/evals/agent-self-improvement.mdx
Comment thread docs.json Outdated
Drop the tail page and its nav entry, repoint the old /cli/tail redirect
to the CLI overview, and replace the overview's tail references with the
eval command (which was missing from that page).
Move the third-party platform pages (Bluejay, Cekura, Coval) under a
Third-party Platforms subgroup in the Evals tab group, absorb the old
evaluations overview's production-evaluation content into the Evals
overview, and add redirects for the old URLs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants