Voice, AI Skills, and Plan Mode for the IDE assistants#11
Open
bigbansal wants to merge 5 commits into
Open
Conversation
Plan Mode is a new chat-surface mode that asks the assistant to produce a numbered plan and clarifying questions before any implementation. The mode is pinned via a single skill (`skills/plan-mode/SKILL.md`) injected into the system prompt as a `pinnedSkills` entry, designed so the Phase 2 skills runtime can layer on top using the same injection point. Both IDEs get the same UX: a "Plan Mode" toggle in the composer menu, a "Plan" pill on the composer when active, and an "Approve & Run" button that appears after a plan-mode assistant turn — it sends "APPROVED — proceed with implementation" and exits Plan Mode for the rest of the session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a deterministic skills router (pinned + slash + keyword, capped at 1 auto-routed) shared by both IDE assistants, plus a slash-command palette in the chat composer that opens on `/`. Ships four built-in skills alongside the existing plan-mode skill: /explain -> explain-code /refactor -> refactor /test -> write-tests /indic -> indic-comments (Hindi/Tamil/Bengali/...) The router lives in `@codesetu/core` and is mirrored in Kotlin for the JetBrains plugin. Workspace `.codesetu/skills/*.md` continue to load always-on as today — no regression. Auto-routing can be disabled via the new `codesetu.skills.autoRoute` setting (VSCode) or the matching checkbox in JetBrains settings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ains UI Adds a SpeechProvider package (`@codesetu/core/speech`) with five backends: browser Browser SpeechRecognition + speechSynthesis (no key) local Same as browser; refuses any server fallback (air-gapped) sarvam Sarvam Saaras (STT) + Bulbul (TTS) openai-compatible /v1/audio/transcriptions + /v1/audio/speech huggingface Hugging Face Inference Router (Whisper-large-v3) The chat webview gets a mic button (idle / listening / transcribing states with a red pulse) and a TTS toggle in the composer toolbar. Browser/local paths run entirely in the webview; server paths post audio bytes to the host over the existing message bridge, which calls the SpeechProvider and pushes the transcription or synthesized audio back to the webview. CSP gains a tight `media-src 'self' blob:` and a `connect-src` allowlist derived from the configured speech endpoints. VSCode ships a new `CodeSetu: Setup Speech Provider` command, a separate `codesetu.speech.apiKey` secret slot, and `codesetu.speech.*` settings (sttProvider, ttsProvider, language, ttsEnabled, sttBaseUrl/sttModel, ttsBaseUrl/ttsModel). JetBrains mirrors the UI, settings, secret store (`CodeSetuSpeechApiKeyStore`), and host bridge (`CodeSetuSpeechClient` over JDK HttpClient). The mic button shows a clear error if JCEF blocks `getUserMedia` — server STT on JetBrains still depends on a pending JCEF `--enable-media-stream` spike before mic input works end-to-end. TTS via browser speechSynthesis works in JCEF without additional flags. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JetBrains plugin now registers a JBCefAppRequiredArgumentsProvider that adds the two CEF flags required for the chat webview's mic button to work: --enable-features=WebRTC,MediaStream,AudioServiceOutOfProcess --use-fake-ui-for-media-stream Without the first, getUserMedia throws NotSupportedError before the user is ever prompted. The second auto-approves the in-page permission request (the OS still gates physical mic access). The trade-off applies to every JCEF webview in the IDE process and is documented in apps/jetbrains/README.md under "Voice in JetBrains". Also updates the top-level README, both app READMEs, and CHANGELOG so reviewers and testers know what to look for during the smoke test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round of cleanups requested after PR review: * Strip TTS end-to-end — toggle, settings (ttsProvider, ttsEnabled, ttsBaseUrl, ttsModel), host bridges, Sarvam/OpenAI/HF synthesize, the webview speakViaBrowser/speakViaServer/maybeSpeakAssistant paths, and the related tests. Voice is now STT-only. * Drop the "local" speech provider — it was misleading (routed to the same WebSpeech path as "browser", not actually on-device). * Sarvam STT default model bumped to saarika:v2; response parsing kept loose so a Sarvam-side rename doesn't break us silently. * JetBrains default STT provider switched to sarvam (browser SpeechRecognition does not work in JCEF — no Google cloud-speech keys in the embedded Chromium build). * Mic UX: pointerdown >250ms = push-to-talk (release stops), short press = tap-to-toggle, spacebar in an empty/focused composer also push-to- talks, Esc stops an active capture. Same in both webviews. * Wire isPlanModeApproval into both responders — typing APPROVED / RUN (or clicking Approve & Run, which sends the canonical phrase) drops plan-mode pinning for that turn so the model implements instead of re-planning. Kills the previously-dead helper. * Slash menu and composer (+) menu are now mutually exclusive. * Editor actions (Explain Selection etc.) inherit the user's current Plan Mode pick via a uiState message from the webview to the host. * JetBrains: persist the chat Plan Mode toggle across panel reloads via CodeSetuSettingsState.chatPlanModeOn, templated into chat.html on render. * JetBrains: new Tools → CodeSetu → Setup Speech Provider action that mirrors the VSCode wizard. * Bump both apps to 0.3.0 and add JetBrains change-notes entry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three chat-surface features to both CodeSetu IDE assistants (
apps/vscode,apps/jetbrains) at feature parity, behind opt-in toggles and settings:APPROVED — proceed with implementationand exits Plan Mode. Single source of truth inskills/plan-mode/SKILL.md, runtime constants in TS and Kotlin./explain,/refactor,/test,/indic) alongside the newplan-modeskill. Workspace.codesetu/skills/*.mdcontinue to load always-on — no regression.codesetu.skills.autoRoutelets users disable keyword auto-routing.@codesetu/core/speechpackage withbrowser,local,sarvam,openai-compatible,huggingface. Mic button + TTS toggle in the shared chat template. Browser/local run entirely in the webview; server backends post audio over the host bridge. CSP tightened tomedia-src 'self' blob:+ an allowlistedconnect-srcderived from configured speech endpoints. NewCodeSetu: Setup Speech Providercommand (VSCode) and a separatecodesetu.speech.apiKeyOS-secret slot on both IDEs (Sarvam Saaras/Bulbul keys differ from chat keys).Three commits, each independently shippable:
737df99feat(plan-mode)1c94175feat(skills)94e5e91feat(voice)Outstanding before voice is fully usable in JetBrains
JCEF mic permission spike —
getUserMediais blocked by default in JBCef. The UI surfaces a clear error when this happens. The fix is to add--enable-media-streamtoJBCefApp.getInstance()args and aCefPermissionHandlerthat auto-approves audio. Browser-sidespeechSynthesis(TTS read-aloud) works in JCEF without any extra flags.VSCode voice has no such spike — the VSCode webview's
getUserMediaworks as soon as the user grants permission.What's intentionally NOT in this PR
apps/vscode-apiclient,apps/jetbrains-apiclient,packages/api-client-core) are untouched.Test gate
All green on the branch tip:
@codesetu/core: build · lint · 54 tests across 6 files (9 new for the skills router, 14 new for speech)apps/vscode: esbuild bundle (937.7 KB) · ESLint cleanapps/jetbrains:compileKotlin+ JUnit suite pass (no new tests, settings/prompt changes covered by existing model/payload tests)Test plan
+menu, send a small request. Confirm the assistant produces a numbered checklist with no code edits. Click "Approve & Run" → confirm the next turn implements and Plan Mode is off./in the composer. Confirm the palette opens with 5 entries (plan/explain/refactor/test/indic). Arrow keys + Enter inserts the command + space. Send/explainwith code selected; verify the response is structured per the explain-code skill (check Output channel for the system prompt if needed).codesetu.skills.autoRoute=falseand confirm it no longer auto-activates.speechSynthesis.CodeSetu: Setup Speech Provider, picksarvam(oropenai-compatiblewith a Whisper endpoint), supply a key. Repeat the mic + TTS flow; verify the host log showsSpeech.transcribe via <provider>../gradlew :runIde, repeat the Plan Mode flow above./explain+ auto-route checks.speechSynthesisreads responses aloud in JCEF.--enable-media-stream.🤖 Generated with Claude Code