Voice, AI Skills, and Plan Mode for the IDE assistants by bigbansal · Pull Request #11 · getcodesetu/codesetu

bigbansal · 2026-05-28T16:44:52Z

Summary

Adds three chat-surface features to both CodeSetu IDE assistants (apps/vscode, apps/jetbrains) at feature parity, behind opt-in toggles and settings:

Plan Mode — chat toggle that asks the model for a numbered plan + clarifying questions instead of code edits. "Approve & Run" sends APPROVED — proceed with implementation and exits Plan Mode. Single source of truth in skills/plan-mode/SKILL.md, runtime constants in TS and Kotlin.
AI Skills runtime — deterministic router (pinned + slash + keyword, capped at 1 auto-routed) with a slash-command palette in the composer. Ships 4 new built-ins (/explain, /refactor, /test, /indic) alongside the new plan-mode skill. Workspace .codesetu/skills/*.md continue to load always-on — no regression. codesetu.skills.autoRoute lets users disable keyword auto-routing.
Voice (STT + TTS, 5 backends) — new @codesetu/core/speech package with browser, local, sarvam, openai-compatible, huggingface. Mic button + TTS toggle in the shared chat template. Browser/local run entirely in the webview; server backends post audio over the host bridge. CSP tightened to media-src 'self' blob: + an allowlisted connect-src derived from configured speech endpoints. New CodeSetu: Setup Speech Provider command (VSCode) and a separate codesetu.speech.apiKey OS-secret slot on both IDEs (Sarvam Saaras/Bulbul keys differ from chat keys).

Three commits, each independently shippable:

737df99 feat(plan-mode)
1c94175 feat(skills)
94e5e91 feat(voice)

Outstanding before voice is fully usable in JetBrains

JCEF mic permission spike — getUserMedia is blocked by default in JBCef. The UI surfaces a clear error when this happens. The fix is to add --enable-media-stream to JBCefApp.getInstance() args and a CefPermissionHandler that auto-approves audio. Browser-side speechSynthesis (TTS read-aloud) works in JCEF without any extra flags.

VSCode voice has no such spike — the VSCode webview's getUserMedia works as soon as the user grants permission.

What's intentionally NOT in this PR

The api-client products (apps/vscode-apiclient, apps/jetbrains-apiclient, packages/api-client-core) are untouched.
Inline completions and the right-click editor actions (Explain/Refactor/etc.) are untouched. Voice/skills/plan-mode are chat-surface features only.
Tool-call dry-run plan mode (Claude-Code-style "approve every tool call") is designed-for-but-not-built — the prompt structure and skills-as-pinned-fragments pattern is in place so it can layer on once a tool-execution loop exists.

Test gate

All green on the branch tip:

@codesetu/core: build · lint · 54 tests across 6 files (9 new for the skills router, 14 new for speech)
apps/vscode: esbuild bundle (937.7 KB) · ESLint clean
apps/jetbrains: compileKotlin + JUnit suite pass (no new tests, settings/prompt changes covered by existing model/payload tests)

Test plan

🤖 Generated with Claude Code

Plan Mode is a new chat-surface mode that asks the assistant to produce a numbered plan and clarifying questions before any implementation. The mode is pinned via a single skill (`skills/plan-mode/SKILL.md`) injected into the system prompt as a `pinnedSkills` entry, designed so the Phase 2 skills runtime can layer on top using the same injection point. Both IDEs get the same UX: a "Plan Mode" toggle in the composer menu, a "Plan" pill on the composer when active, and an "Approve & Run" button that appears after a plan-mode assistant turn — it sends "APPROVED — proceed with implementation" and exits Plan Mode for the rest of the session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a deterministic skills router (pinned + slash + keyword, capped at 1 auto-routed) shared by both IDE assistants, plus a slash-command palette in the chat composer that opens on `/`. Ships four built-in skills alongside the existing plan-mode skill: /explain -> explain-code /refactor -> refactor /test -> write-tests /indic -> indic-comments (Hindi/Tamil/Bengali/...) The router lives in `@codesetu/core` and is mirrored in Kotlin for the JetBrains plugin. Workspace `.codesetu/skills/*.md` continue to load always-on as today — no regression. Auto-routing can be disabled via the new `codesetu.skills.autoRoute` setting (VSCode) or the matching checkbox in JetBrains settings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ains UI Adds a SpeechProvider package (`@codesetu/core/speech`) with five backends: browser Browser SpeechRecognition + speechSynthesis (no key) local Same as browser; refuses any server fallback (air-gapped) sarvam Sarvam Saaras (STT) + Bulbul (TTS) openai-compatible /v1/audio/transcriptions + /v1/audio/speech huggingface Hugging Face Inference Router (Whisper-large-v3) The chat webview gets a mic button (idle / listening / transcribing states with a red pulse) and a TTS toggle in the composer toolbar. Browser/local paths run entirely in the webview; server paths post audio bytes to the host over the existing message bridge, which calls the SpeechProvider and pushes the transcription or synthesized audio back to the webview. CSP gains a tight `media-src 'self' blob:` and a `connect-src` allowlist derived from the configured speech endpoints. VSCode ships a new `CodeSetu: Setup Speech Provider` command, a separate `codesetu.speech.apiKey` secret slot, and `codesetu.speech.*` settings (sttProvider, ttsProvider, language, ttsEnabled, sttBaseUrl/sttModel, ttsBaseUrl/ttsModel). JetBrains mirrors the UI, settings, secret store (`CodeSetuSpeechApiKeyStore`), and host bridge (`CodeSetuSpeechClient` over JDK HttpClient). The mic button shows a clear error if JCEF blocks `getUserMedia` — server STT on JetBrains still depends on a pending JCEF `--enable-media-stream` spike before mic input works end-to-end. TTS via browser speechSynthesis works in JCEF without additional flags. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

JetBrains plugin now registers a JBCefAppRequiredArgumentsProvider that adds the two CEF flags required for the chat webview's mic button to work: --enable-features=WebRTC,MediaStream,AudioServiceOutOfProcess --use-fake-ui-for-media-stream Without the first, getUserMedia throws NotSupportedError before the user is ever prompted. The second auto-approves the in-page permission request (the OS still gates physical mic access). The trade-off applies to every JCEF webview in the IDE process and is documented in apps/jetbrains/README.md under "Voice in JetBrains". Also updates the top-level README, both app READMEs, and CHANGELOG so reviewers and testers know what to look for during the smoke test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Round of cleanups requested after PR review: * Strip TTS end-to-end — toggle, settings (ttsProvider, ttsEnabled, ttsBaseUrl, ttsModel), host bridges, Sarvam/OpenAI/HF synthesize, the webview speakViaBrowser/speakViaServer/maybeSpeakAssistant paths, and the related tests. Voice is now STT-only. * Drop the "local" speech provider — it was misleading (routed to the same WebSpeech path as "browser", not actually on-device). * Sarvam STT default model bumped to saarika:v2; response parsing kept loose so a Sarvam-side rename doesn't break us silently. * JetBrains default STT provider switched to sarvam (browser SpeechRecognition does not work in JCEF — no Google cloud-speech keys in the embedded Chromium build). * Mic UX: pointerdown >250ms = push-to-talk (release stops), short press = tap-to-toggle, spacebar in an empty/focused composer also push-to- talks, Esc stops an active capture. Same in both webviews. * Wire isPlanModeApproval into both responders — typing APPROVED / RUN (or clicking Approve & Run, which sends the canonical phrase) drops plan-mode pinning for that turn so the model implements instead of re-planning. Kills the previously-dead helper. * Slash menu and composer (+) menu are now mutually exclusive. * Editor actions (Explain Selection etc.) inherit the user's current Plan Mode pick via a uiState message from the webview to the host. * JetBrains: persist the chat Plan Mode toggle across panel reloads via CodeSetuSettingsState.chatPlanModeOn, templated into chat.html on render. * JetBrains: new Tools → CodeSetu → Setup Speech Provider action that mirrors the VSCode wizard. * Bump both apps to 0.3.0 and add JetBrains change-notes entry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

bigbansal and others added 3 commits May 28, 2026 21:19

bigbansal changed the base branch from codex-ide-feature-foundation to main May 28, 2026 16:46

bigbansal and others added 2 commits May 28, 2026 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice, AI Skills, and Plan Mode for the IDE assistants#11

Voice, AI Skills, and Plan Mode for the IDE assistants#11
bigbansal wants to merge 5 commits into
mainfrom
voice-skills-planmode

bigbansal commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigbansal commented May 28, 2026

Summary

Outstanding before voice is fully usable in JetBrains

What's intentionally NOT in this PR

Test gate

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant