Skip to content

Add GGUF audio and microphone transcription support#88

Closed
Godzilla675 wants to merge 4 commits into
Siddhesh2377:re-writefrom
Godzilla675:Fix-whisper-initial-download-issue
Closed

Add GGUF audio and microphone transcription support#88
Godzilla675 wants to merge 4 commits into
Siddhesh2377:re-writefrom
Godzilla675:Fix-whisper-initial-download-issue

Conversation

@Godzilla675

@Godzilla675 Godzilla675 commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • keep the GGUF filtering and quant parsing fixes from the original PR and add the trailing-descriptor regression test
  • add projector sidecar download, pairing, load, unload, and delete handling for multimodal/audio GGUF models
  • surface both file-based and in-app microphone transcription in chat through the rebuilt dual-ABI gguf_lib AAR
  • add staged microphone UX: tap to start, tap to stop, review/edit the prompt, then send through the existing GGUF audio path

Dependencies

Validation

  • ./gradlew --no-daemon --no-configuration-cache --max-workers=1 -Dorg.gradle.jvmargs='-Xmx2g -XX:MaxMetaspaceSize=512m -Dfile.encoding=UTF-8' -Pksp.incremental=false :app:testDebugUnitTest --tests com.dark.tool_neuron.repo.ModelStoreRepositoryTest
  • ./gradlew --no-daemon --no-configuration-cache --max-workers=1 -Dorg.gradle.jvmargs='-Xmx2g -XX:MaxMetaspaceSize=512m -Dfile.encoding=UTF-8' -Pksp.incremental=false :app:assembleDebug
  • closes Download of Whisper-EN-Small fails #57

Copilot AI review requested due to automatic review settings March 10, 2026 19:33
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors GGUF filename handling in ModelStoreRepository to improve model file filtering, quantization parsing, and model ID generation, and adds unit tests to validate the new helper logic.

Changes:

  • Added centralized GGUF helpers (isSupportedGgufFile, stripGgufSuffix, extractQuantType) and updated GGUF listing logic to use them.
  • Bumped model store cache version to invalidate stale cached listings after filtering/parsing changes.
  • Added ModelStoreRepositoryTest to cover GGUF extension handling, projection artifact filtering, suffix stripping, and quant parsing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
app/src/main/java/com/dark/tool_neuron/repo/ModelStoreRepository.kt Introduces helper methods for GGUF filtering/quant parsing, updates model listing logic, bumps cache version.
app/src/test/java/com/dark/tool_neuron/repo/ModelStoreRepositoryTest.kt Adds unit tests for the new GGUF helper behaviors and edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/src/main/java/com/dark/tool_neuron/repo/ModelStoreRepository.kt Outdated
@Siddhesh2377

Copy link
Copy Markdown
Owner

Hey @Godzilla675 Till now we don't support Audio GGUF

@Godzilla675

Copy link
Copy Markdown
Contributor Author

Hmm ok ill see another way to fix the issue.

@Godzilla675

Copy link
Copy Markdown
Contributor Author

@Siddhesh2377 can I implement gguf audio model support?

@Godzilla675 Godzilla675 marked this pull request as draft March 10, 2026 20:17
@Siddhesh2377

Copy link
Copy Markdown
Owner

Yes @Godzilla675
First add it in the custom llama.cpp repo
Then call it in gguf_lib inside ai systems repo
Then call it in tool neuron okay
Please fallow this pattern

@Godzilla675

Copy link
Copy Markdown
Contributor Author

Ok

@Godzilla675

Copy link
Copy Markdown
Contributor Author

@Siddhesh2377 I noticed that there is no mic transcription-based support in the app while working on the audio gguf support. shall I add it while I'm working?

@Godzilla675 Godzilla675 changed the title Fix whisper initial download issue Add GGUF audio model support and fix Whisper download flow Mar 12, 2026
@Godzilla675 Godzilla675 marked this pull request as ready for review March 12, 2026 20:04

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55262cc964

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/src/main/java/com/dark/tool_neuron/service/ModelDownloadService.kt Outdated
@Siddhesh2377

Copy link
Copy Markdown
Owner

Hey @Godzilla675 Yes u can add it for sure !

- add RECORD_AUDIO permission and a MediaRecorder-based chat recorder
- keep file import as a fallback while staging recorded clips before send
- route microphone audio through the existing GGUF transcription path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Godzilla675 Godzilla675 changed the title Add GGUF audio model support and fix Whisper download flow Add GGUF audio and microphone transcription support Mar 13, 2026
@Godzilla675

Godzilla675 commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

@Siddhesh2377 I finished the implementation if you want to review

@Godzilla675

Copy link
Copy Markdown
Contributor Author

@Siddhesh2377 do I fix the merge conflicts or do I wait a bit until you finish the changes you are currently doing?

@Siddhesh2377

Copy link
Copy Markdown
Owner

Hey @Godzilla675
I would say solve the conflicts and can u send me a working apk file,
Also if u have discord then please join the grp or dm me, as it is a easy platform for communication

@Godzilla675

Copy link
Copy Markdown
Contributor Author

@Siddhesh2377 ok, where is the discord though? can you send me the link?

@Siddhesh2377

Copy link
Copy Markdown
Owner

Yes, make a working apk release on your fork and send me the link on discord
.https://discord.gg/V9vm9cwnw

@Godzilla675

Copy link
Copy Markdown
Contributor Author

ok.

Godzilla675 and others added 3 commits March 14, 2026 23:55
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore GGUF projector/audio integration after the re-write merge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Recognize mmjproj as a projector marker alongside mmproj, vision-adapter, projector
- Score mmjproj candidates in sidecar auto-download selection
- Broaden user-facing projector readiness message to mmproj/mmjproj
- Add unit tests for mmjproj filtering and case-insensitive detection
@Godzilla675

Copy link
Copy Markdown
Contributor Author

@Siddhesh2377 i added mmproj support. you can download the apk here https://github.com/Godzilla675/ToolNeuron/releases/tag/toolneuron-fix-whisper-test-apk

@Siddhesh2377

Copy link
Copy Markdown
Owner

Hey @Godzilla675 — thanks a lot for putting this together. Closing this one because the v3 rewrite (#105) just landed and it reorganises the engine layer end-to-end, so this branch can't be merged cleanly anymore. The new GGUF engine in v3 already includes whisper / mic transcription support through sherpa-onnx, so the feature itself isn't lost.

Really appreciate the contribution — if you want to port any specific tweak onto the new base, happy to look at a fresh PR against master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Download of Whisper-EN-Small fails

3 participants