Skip to content

fix(claude-sdk): emit token-by-token streaming to the client#720

Open
peterus wants to merge 2 commits into
siteboon:mainfrom
peterus:fix/sdk-streaming-partial-messages
Open

fix(claude-sdk): emit token-by-token streaming to the client#720
peterus wants to merge 2 commits into
siteboon:mainfrom
peterus:fix/sdk-streaming-partial-messages

Conversation

@peterus
Copy link
Copy Markdown

@peterus peterus commented Apr 28, 2026

Summary

Assistant responses arrive as a single block instead of streaming token by token. Three coordinated changes are needed to make streaming actually flow through the existing stream_delta / stream_end pipeline:

  1. server/claude-sdk.js — set sdkOptions.includePartialMessages = true so the Claude Agent SDK emits SDKPartialAssistantMessage events during a turn. Without this flag the SDK only emits the final consolidated assistant message.
  2. server/modules/providers/list/claude/claude-sessions.provider.ts — unwrap the { type: 'stream_event', event: { ... } } envelope into the existing stream_delta / stream_end normalized kinds. The previous flat-event branch (raw.type === 'content_block_delta') never matched the real SDK shape and was effectively dead code.
  3. server/claude-sdk.js — once text was streamed for a turn, strip text parts from the consolidated SDKAssistantMessage so the client does not render both the streamed buffer (finalized by stream_end) and a duplicate text message. tool_use and thinking blocks pass through unchanged so tool grouping and thinking rendering still work.

Splitting these into separate PRs would leave intermediate states broken (flag-only would emit unhandled stream events; provider-only would have no stream events to unwrap; either without the dedup would double-render the response).

Test plan

  • Send a long prompt — text appears token by token in the UI rather than as a single block at the end
  • Final rendered message contains the full response (no truncation, no duplication)
  • Tool use still appears (e.g. Read, Bash) and is grouped under its parent assistant message
  • Thinking blocks still render when the model emits them
  • Token-budget event still arrives at end-of-turn (message.type === 'result' path unchanged)
  • Cursor / Gemini provider streaming behavior unchanged (only Claude provider touched)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Enabled partial/token streaming for WebSocket/SSE writer connections for faster response delivery.
  • Bug Fixes

    • Prevented duplicate text from appearing when streamed content and consolidated messages arrive.
    • Improved handling of live-stream events so incremental text and stream end signals normalize correctly.

Three coordinated changes so the SDK's partial output actually reaches the UI:

1. server/claude-sdk.js: set sdkOptions.includePartialMessages = true so the
   Claude Agent SDK emits SDKPartialAssistantMessage events (stream_event)
   alongside the consolidated assistant messages.

2. claude-sessions.provider.ts: unwrap stream_event envelopes (event.type ===
   'content_block_delta'/'content_block_stop') into the existing stream_delta /
   stream_end normalized kinds. The previous flat-event branch never matched the
   real SDK shape.

3. server/claude-sdk.js: drop text parts from the consolidated assistant
   message once they were streamed for the same turn — otherwise the client
   would render both the streamed buffer (finalized by stream_end) and a
   duplicate text message. Tool-use and thinking blocks pass through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 28, 2026 16:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Partial-message streaming is conditionally enabled for WebSocket/SSE "writer" consumers via sdkOptions.includePartialMessages = true. The SDK tracks streamed text_delta fragments and removes duplicate consolidated text parts when the full assistant message arrives. The provider now recognizes stream_event wrappers and emits stream_delta/stream_end normalized messages.

Changes

Cohort / File(s) Summary
Claude SDK Streaming Configuration
server/claude-sdk.js
Adds conditional partial-message streaming for writer consumers (includePartialMessages = true). Tracks text_delta streaming per assistant turn and filters out consolidated text content when deltas were streamed to avoid duplicate rendering.
Session Provider Event Normalization
server/modules/providers/list/claude/claude-sessions.provider.ts
Extends normalizeMessage to accept Anthropic live SDK payloads wrapped in stream_event. Emits stream_delta when content_block_delta.text_delta.text is present and stream_end on content_block_stop; ignores invalid/missing wrapper shapes.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Writer (WS/SSE)
    participant Server as App Server
    participant SDK as Claude SDK
    participant Provider as Anthropic Normalizer

    Client->>Server: open writer session / subscribe
    Server->>SDK: enable includePartialMessages for writer
    SDK->>Provider: receive stream_event(content_block_delta with text_delta)
    Provider-->>Server: emit stream_delta (text fragment)
    Server-->>Client: send streamed text fragments
    SDK->>Provider: receive stream_event(content_block_stop)
    Provider-->>Server: emit stream_end
    SDK->>Server: send consolidated assistant message
    Server->>Server: detect textWasStreamed -> strip 'text' parts from consolidated payload
    Server-->>Client: send consolidated message without duplicate text
Loading

Possibly related PRs

Suggested reviewers

  • viper151

Poem

🐰 A rabbit's note
Streams of tokens hop and play,
Tiny deltas lead the way,
When the whole arrives, I skip the repeat,
No double carrots, clean and neat! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(claude-sdk): emit token-by-token streaming to the client' directly and accurately describes the main objective of the PR—enabling token-by-token streaming of assistant responses through the existing stream pipeline.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables true token-by-token streaming for the Claude provider by turning on partial message emission in the Claude Agent SDK and wiring the SDK’s streaming event shape into the existing stream_delta / stream_end pipeline while preventing duplicate final text rendering.

Changes:

  • Enable includePartialMessages in the Claude Agent SDK options to emit partial assistant output events.
  • Normalize SDK stream_event envelopes into existing stream_delta / stream_end messages in the Claude sessions provider.
  • Deduplicate streamed text by removing text parts from the consolidated assistant message after streaming has occurred.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
server/claude-sdk.js Enables partial message streaming and strips consolidated assistant text to prevent duplicate rendering.
server/modules/providers/list/claude/claude-sessions.provider.ts Unwraps SDK stream_event envelopes into normalized streaming events.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/claude-sdk.js Outdated
Comment thread server/claude-sdk.js
Comment thread server/modules/providers/list/claude/claude-sessions.provider.ts Outdated
…owing

- Only enable includePartialMessages and the consolidated-text stripping for
  streaming writers (WebSocket, SSE). ResponseCollector and the git
  commit-message generator call queryClaudeSDK without streaming and rely on
  the consolidated assistant text payload.
- Use readObjectRecord for stream_event and event.delta narrowing, matching
  the rest of the provider's defensive parsing.

Addresses Copilot review feedback on siteboon#720.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/claude-sdk.js`:
- Around line 687-699: The duplicate-suppression currently only runs when
message.message?.content is an array, so textWasStreamed can remain true for
non-array assistant consolidations; update the consolidation logic around
writerStreams/textWasStreamed to always clear streamed text and reset
textWasStreamed regardless of payload shape: inspect message.type ===
'assistant' and message.message?.content, if it's an array filter out parts with
part.type === 'text' (as done now), and if it's a single content object handle
the single-object case (remove or replace the text content) before assigning
transformedMessage; in all cases ensure textWasStreamed = false after the
assistant consolidation step so duplicate suppression is reset (refer to
writerStreams, textWasStreamed, message.message?.content, transformedMessage).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dc9798d1-bda9-4d3d-bc39-2d38484ebab4

📥 Commits

Reviewing files that changed from the base of the PR and between d6b3bfb and a557506.

📒 Files selected for processing (2)
  • server/claude-sdk.js
  • server/modules/providers/list/claude/claude-sessions.provider.ts

Comment thread server/claude-sdk.js
Comment on lines +687 to +699
if (
writerStreams &&
textWasStreamed &&
message.type === 'assistant' &&
Array.isArray(message.message?.content)
) {
const filtered = message.message.content.filter((part) => part.type !== 'text');
transformedMessage = {
...transformedMessage,
message: { ...message.message, content: filtered },
};
textWasStreamed = false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make duplicate-suppression reset independent of array-only assistant payloads.

If streamed text is seen but the consolidated assistant payload is not an array, this branch won’t clear duplicate text and won’t reset textWasStreamed. Handle both payload shapes and always reset after the assistant consolidation step.

💡 Suggested hardening patch
-      if (
-        writerStreams &&
-        textWasStreamed &&
-        message.type === 'assistant' &&
-        Array.isArray(message.message?.content)
-      ) {
-        const filtered = message.message.content.filter((part) => part.type !== 'text');
-        transformedMessage = {
-          ...transformedMessage,
-          message: { ...message.message, content: filtered },
-        };
-        textWasStreamed = false;
-      }
+      if (
+        writerStreams &&
+        textWasStreamed &&
+        message.type === 'assistant' &&
+        message.message
+      ) {
+        const content = message.message.content;
+        if (Array.isArray(content)) {
+          const filtered = content.filter((part) => part.type !== 'text');
+          transformedMessage = {
+            ...transformedMessage,
+            message: { ...message.message, content: filtered },
+          };
+        } else if (typeof content === 'string') {
+          transformedMessage = {
+            ...transformedMessage,
+            message: { ...message.message, content: [] },
+          };
+        }
+        textWasStreamed = false;
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/claude-sdk.js` around lines 687 - 699, The duplicate-suppression
currently only runs when message.message?.content is an array, so
textWasStreamed can remain true for non-array assistant consolidations; update
the consolidation logic around writerStreams/textWasStreamed to always clear
streamed text and reset textWasStreamed regardless of payload shape: inspect
message.type === 'assistant' and message.message?.content, if it's an array
filter out parts with part.type === 'text' (as done now), and if it's a single
content object handle the single-object case (remove or replace the text
content) before assigning transformedMessage; in all cases ensure
textWasStreamed = false after the assistant consolidation step so duplicate
suppression is reset (refer to writerStreams, textWasStreamed,
message.message?.content, transformedMessage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants