Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 30 additions & 59 deletions .agents/skills/deepgram-python-audio-intelligence/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,11 @@ description: Use when writing or reviewing Python code in this repo that calls D

Analytics overlays applied to `/v1/listen` transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params.

## When to use this product

- You have **audio** (file, URL, or live stream) and want analytics alongside the transcript.
- REST is the primary path — most analytics are REST-only.

**Use a different skill when:**
- You want a pure transcript with no analytics → `deepgram-python-speech-to-text`.
- Your input is already transcribed text → `deepgram-python-text-intelligence` (`/v1/read`).
- You need conversational turn-taking → `deepgram-python-conversational-stt`.
- You need a full interactive agent → `deepgram-python-voice-agent`.
- Pure transcript with no analytics → `deepgram-python-speech-to-text`.
- Input is already transcribed text → `deepgram-python-text-intelligence` (`/v1/read`).
- Conversational turn-taking → `deepgram-python-conversational-stt`.
- Full interactive agent → `deepgram-python-voice-agent`.

## Feature availability: REST vs WSS

Expand Down Expand Up @@ -55,13 +50,13 @@ response = client.listen.v1.media.transcribe_url(
model="nova-3",
smart_format=True,
punctuate=True,
diarize=True, # speaker separation
summarize="v2", # "v2" for the current model; True also accepted on /v1/listen
diarize=True,
summarize="v2",
topics=True,
intents=True,
sentiment=True,
detect_language=True,
redact=["pci", "pii"], # or Sequence[str]
redact=["pci", "pii"],
language="en-US",
)

Expand Down Expand Up @@ -98,44 +93,23 @@ response = client.listen.v1.media.transcribe_file(

## Quick start — diarization with word-level timings

Enable speaker separation and word-level timestamps in a single request, then iterate the per-word objects to build a speaker-labelled transcript with timing.

```python
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
model="nova-3",
diarize=True, # tag each word with a speaker id
smart_format=True, # punctuated_word for cleaner output
diarize=True,
smart_format=True,
punctuate=True,
)

words = response.results.channels[0].alternatives[0].words or []

# Per-word: speaker, timestamps, confidence
for w in words:
speaker = getattr(w, "speaker", None)
text = w.punctuated_word or w.word
print(f"[speaker {speaker}] {text} ({w.start:.2f}s–{w.end:.2f}s, conf={w.confidence:.2f})")

# Group consecutive words by speaker into utterances
from itertools import groupby
for speaker, group in groupby(words, key=lambda w: getattr(w, "speaker", None)):
text = " ".join((w.punctuated_word or w.word) for w in group)
print(f"Speaker {speaker}: {text}")
```

Per-word fields available on each entry:

| Field | Type | Description |
|---|---|---|
| `word` | `str` | Lowercase token |
| `punctuated_word` | `str \| None` | Token with smart-formatted casing/punctuation (when `smart_format=True`) |
| `start`, `end` | `float` | Audio timestamps in seconds |
| `confidence` | `float` | 0.0–1.0 confidence |
| `speaker` | `int \| None` | Speaker id (when `diarize=True`); `None` if diarization disabled |
| `speaker_confidence` | `float \| None` | Speaker-id confidence |

For a higher-level breakdown, set `utterances=True` to get pre-grouped speaker turns at `response.results.utterances`. Set `paragraphs=True` for a `paragraphs` view organised by speaker turn boundaries.
Each word object has: `word`, `punctuated_word`, `start`/`end` (float seconds), `confidence`, `speaker` (int, when `diarize=True`), `speaker_confidence`. For pre-grouped speaker turns use `utterances=True` (`response.results.utterances`) or `paragraphs=True`.

## Quick start — WSS subset (diarize / redact / entities only)

Expand All @@ -151,27 +125,32 @@ with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as c
conn.send_finalize()
```

## Validation & recovery

After transcription, verify analytics fields are populated:

```python
r = response.results
if r.summary is None and summarize_was_requested:
# Feature silently ignored -- likely passed on WSS (REST-only).
# Recovery: re-run via REST instead of WSS.
response = client.listen.v1.media.transcribe_url(url=..., summarize="v2", ...)
```

For `redact`, confirm redacted markers appear in the transcript (e.g., search for `[REDACTED]`). A missing marker means encoding mismatch or unsupported redact value.

## Key parameters

`summarize`, `topics`, `intents`, `sentiment`, `detect_language`, `diarize`, `redact`, `custom_topic`, `custom_topic_mode`, `custom_intent`, `custom_intent_mode`, `detect_entities`, plus all the standard STT params (`model`, `language`, `encoding`, `sample_rate`, ...).

`redact` is typed as `Optional[str]` in the current generated SDK (`src/deepgram/listen/v1/media/client.py`). Pass a single redaction mode such as `"pci"`, `"pii"`, `"numbers"`, or `"phi"`. Multi-mode redaction at the transport level is supported by sending `redact` as a repeated query parameter — check `src/deepgram/types/listen_v1redact.py` for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier `Union[str, Sequence[str]]` override is no longer carried in `.fernignore`.
`redact` is typed as `Optional[str]` in the generated SDK. Pass a single mode (`"pci"`, `"pii"`, `"numbers"`, `"phi"`). For multi-mode, use repeated query params or multiple calls -- see `src/deepgram/types/listen_v1redact.py`.

## API reference (layered)

1. **In-repo reference**: `reference.md` — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset).
2. **OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml
3. **AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml
4. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`.
5. **Product docs**:
- https://developers.deepgram.com/docs/stt-intelligence-feature-overview
- https://developers.deepgram.com/docs/summarization
- https://developers.deepgram.com/docs/topic-detection
- https://developers.deepgram.com/docs/intent-recognition
- https://developers.deepgram.com/docs/sentiment-analysis
- https://developers.deepgram.com/docs/language-detection
- https://developers.deepgram.com/docs/redaction
- https://developers.deepgram.com/docs/diarization
1. **In-repo reference**: `reference.md` -- "Listen V1 Media" (REST), "Listen V1 Connect" (WSS subset).
2. **OpenAPI / AsyncAPI**: https://developers.deepgram.com/openapi.yaml, https://developers.deepgram.com/asyncapi.yaml
3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`.
4. **Product docs**: https://developers.deepgram.com/docs/stt-intelligence-feature-overview (overview); per-feature pages at `/docs/summarization`, `/docs/topic-detection`, `/docs/intent-recognition`, `/docs/sentiment-analysis`, `/docs/language-detection`, `/docs/redaction`, `/docs/diarization`.

## Gotchas

Expand All @@ -195,12 +174,4 @@ with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as c
- `deepgram-python-conversational-stt` — Flux for turn-taking
- `deepgram-python-voice-agent` — interactive assistants

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install the central skills: `npx skills add deepgram/skills`.
42 changes: 24 additions & 18 deletions .agents/skills/deepgram-python-conversational-stt/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,10 @@ description: Use when writing or reviewing Python code in this repo that calls D

Turn-aware streaming STT at `/v2/listen` — optimized for conversational audio (end-of-turn detection, eager EOT, barge-in scenarios).

## When to use this product

- You're building a **conversational UI** and need explicit turn boundaries.
- You want **Flux models** (optimized for human-to-human or human-to-agent conversation).
- You want lower latency turn signals than v1 utterance_end.

**Use a different skill when:**
- You want general-purpose transcription (captions, batch, non-conversational) → `deepgram-python-speech-to-text`.
- You want a full interactive agent (STT + LLM + TTS) → `deepgram-python-voice-agent`.
- You want analytics (summarize/sentiment) → `deepgram-python-audio-intelligence`.
- General-purpose transcription (captions, batch, non-conversational) → `deepgram-python-speech-to-text`.
- Full interactive agent (STT + LLM + TTS) → `deepgram-python-voice-agent`.
- Analytics (summarize/sentiment) → `deepgram-python-audio-intelligence`.

## Authentication

Expand Down Expand Up @@ -74,6 +68,26 @@ with client.listen.v2.connect(
conn.start_listening()
```

## Error recovery

On `ListenV2FatalError`, the connection is terminal -- open a new one. For transient disconnects (`EventType.CLOSE` without a prior fatal), reconnect with exponential backoff:

```python
import time

def run_with_reconnect(max_retries=5):
for attempt in range(max_retries):
try:
with client.listen.v2.connect(model="flux-general-en", encoding="linear16", sample_rate="16000") as conn:
# ... register handlers, send audio ...
conn.start_listening()
break # clean exit
except Exception as e:
wait = min(2 ** attempt, 30)
print(f"Disconnected ({e}), retrying in {wait}s...")
time.sleep(wait)
```

## Key parameters

| Param | Notes |
Expand Down Expand Up @@ -143,12 +157,4 @@ async with client.listen.v2.connect(model="flux-general-en", ...) as conn:
- `deepgram-python-speech-to-text` — v1 general-purpose STT (REST + WSS)
- `deepgram-python-voice-agent` — full interactive assistant

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install the central skills: `npx skills add deepgram/skills`.
68 changes: 26 additions & 42 deletions .agents/skills/deepgram-python-management-api/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,9 @@ description: Use when writing or reviewing Python code in this repo that calls D

Administrative REST endpoints at `api.deepgram.com/v1/projects`, `/v1/models`, and reusable agent configuration storage. Project-scoped resources live under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests). Global models at `client.manage.v1.models`. Think-model discovery at `client.agent.v1.settings.think.models`. Reusable agent configs at `client.voice_agent.configurations.*`.

## When to use this product

- **Discover / pin models**: `client.manage.v1.models.list()` returns the active STT/TTS set.
- **Project admin**: list/get/update/delete/leave projects.
- **API key lifecycle**: list/create/delete project keys.
- **Member + invite management**: add/remove members, manage roles, send/revoke invites.
- **Usage + billing**: query request volume, balances.
- **Reusable Voice Agent configs**: persist the **`agent` block** of a Settings message on the server, reference by `agent_id`. The stored blob is the `agent` object only (listen / think / speak providers + prompt), not the full `AgentV1Settings`.

**Use a different skill when:**
- You want to actually talk to an agent → `deepgram-python-voice-agent`.
- You want to transcribe or synthesize → STT/TTS skills.
- Running an agent interactively → `deepgram-python-voice-agent`.
- Transcribing or synthesizing → STT/TTS skills.

## Authentication

Expand Down Expand Up @@ -85,33 +76,24 @@ See `examples/51-55` for each sub-module.

## Quick start — Voice Agent configurations

**Important:** The stored config is the `agent` block only (listen/think/speak providers + prompt) as a JSON string, NOT the full `AgentV1Settings`. Top-level fields like `audio` go in the live Settings message at connect time. The returned `agent_id` replaces the inline `agent` object in future Settings messages. Configs are immutable -- create a new one to change behavior; only metadata is mutable.

```python
# List reusable configs
import json
configs = client.voice_agent.configurations.list(project_id=pid)

# Create: `config` is a JSON string of the `agent` BLOCK ONLY — not the full
# Settings message. Do NOT include top-level Settings fields like `audio`;
# those are sent at connect-time in the live Settings message. The stored
# `agent_id` later replaces the inline `agent` object in a Settings message.
import json
config_json = json.dumps({
"listen": {"provider": {"type": "deepgram", "model": "nova-3"}},
"think": {"provider": {"type": "open_ai", "model": "gpt-4o-mini"}, "prompt": "..."},
"speak": {"provider": {"type": "deepgram", "model": "aura-2-asteria-en"}},
})
created = client.voice_agent.configurations.create(
project_id=pid,
config=config_json,
metadata={"label": "support-en"},
project_id=pid, config=config_json, metadata={"label": "support-en"},
)
print(created.agent_id)

# Update metadata (immutable config body — create a new one to change behavior)
client.voice_agent.configurations.update(project_id=pid, agent_id=created.agent_id, metadata={"label": "v2"})

# Get / delete
one = client.voice_agent.configurations.get(project_id=pid, agent_id=created.agent_id)
# client.voice_agent.configurations.delete(project_id=pid, agent_id=...)
```

Think-provider model discovery (which LLMs Agent supports):
Expand Down Expand Up @@ -140,18 +122,28 @@ projects = await client.manage.v1.projects.list()
- https://developers.deepgram.com/reference/voice-agent/agent-configurations/create-agent-configuration
- https://developers.deepgram.com/reference/voice-agent/think-models

## Destructive operation guard

Delete operations (projects, keys, agent configs) are **irreversible**. Always verify the resource before deleting:

```python
# Confirm before deleting a key
key = client.manage.v1.projects.keys.list(project_id=pid)
target = next((k for k in key.api_keys if k.api_key_id == kid), None)
assert target is not None, f"Key {kid} not found"
print(f"Deleting key: {target.comment}")
client.manage.v1.projects.keys.delete(project_id=pid, key_id=kid)
```

## Gotchas

1. **`Token` auth, not `Bearer`.**
2. **Project-scoped resources are nested under `.projects.*`.** There is no top-level `client.manage.v1.keys` / `.members` / `.invites` / `.usage` / `.billing`. Use `client.manage.v1.projects.keys`, `...projects.members`, `...projects.members.invites`, `...projects.usage`, `...projects.billing.balances`, and `...projects.requests` for request logs. The only top-level `client.manage.v1.*` namespaces are `projects` and `models`.
3. **Think-model discovery is on the Agent client**, not Manage: `client.agent.v1.settings.think.models.list()`. There is no `client.manage.v1.agent.*`.
4. **Agent config body is a JSON STRING on create**, not a nested object. Pass `config=json.dumps(...)`.
5. **Agent config is the `agent` block only**, not the full Settings message. Do not include top-level fields like `audio` — those go in the live Settings message at connect time.
6. **Agent configs are immutable** — you cannot edit the config body. Create a new one to change behavior. Only metadata is mutable.
7. **Use `include_outdated=True`** on `models.list()` when pinning older models.
8. **Delete is irreversible.** Wire tests typically comment out destructive calls.
9. **Project-scoped vs global models**: `client.manage.v1.models.list()` returns all; `client.manage.v1.projects.models.list(project_id=...)` returns what the project can access.
10. **Returned agent configs are uninterpolated** — raw stored JSON string. Parse before use.
2. **Project-scoped resources are nested under `.projects.*`.** No top-level `client.manage.v1.keys` etc. Use `client.manage.v1.projects.keys`, `...projects.members`, `...projects.members.invites`, `...projects.usage`, `...projects.billing.balances`, `...projects.requests`.
3. **Think-model discovery is on the Agent client**, not Manage: `client.agent.v1.settings.think.models.list()`.
4. **Agent config body is a JSON STRING on create**: pass `config=json.dumps(...)`. See the Voice Agent configurations section above for full details.
5. **Use `include_outdated=True`** on `models.list()` when pinning older models.
6. **Project-scoped vs global models**: `client.manage.v1.models.list()` returns all; `client.manage.v1.projects.models.list(project_id=...)` returns what the project can access.
7. **Returned agent configs are uninterpolated** -- raw stored JSON string. Parse before use.

## Example files in this repo

Expand All @@ -170,12 +162,4 @@ projects = await client.manage.v1.projects.list()

- `deepgram-python-voice-agent` — run an agent (use a config created here)

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install the central skills: `npx skills add deepgram/skills`.
Loading