Upgrade vendored llama.cpp to b9585#20
Merged
Merged
Conversation
…and clippy cleanup
There was a problem hiding this comment.
Pull request overview
This PR updates the vendored llama.cpp integration and adapts the Rust bindings/tests to match the new upstream APIs. It introduces a per-LlamaModel cached chat parser handle (to avoid repeated template analysis), updates chat-template application to support an enable_thinking flag, and replaces a multi-argument multimodal evaluation call with a parameter struct. It also centralizes several Clippy suppressions into per-crate Cargo.toml lint config.
Changes:
- Upgrade the vendored llama.cpp interface and adjust C++ wrappers (chat parsing, chat template application, MTMD bitmap init, and build defines).
- Add a per-model cached chat parser in
LlamaModeland update parsing/apply APIs (including anenable_thinkingswitch). - Refactor multimodal chunk evaluation parameters into
EvalMultimodalChunksParamsand update tests accordingly; move Clippy lint exceptions to crate-level config.
Reviewed changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| llama-cpp-test-harness/tests/harness_self_test.rs | Removes per-file Clippy expect now handled via crate lint config. |
| llama-cpp-test-harness/Cargo.toml | Allows unnecessary_wraps with rationale for harness trial function signatures. |
| llama-cpp-bindings/src/tool_call_format/paired_quote_args.rs | Removes test-module Clippy expect; relies on crate lint config. |
| llama-cpp-bindings/src/send_logs_to_log.rs | Makes “deny panic/unwrap/…” conditional on not(test); removes test-level suppression. |
| llama-cpp-bindings/src/sampled_token_classifier.rs | Changes multimodal eval API to accept EvalMultimodalChunksParams. |
| llama-cpp-bindings/src/mtmd/mtmd_input_chunk.rs | Updates MTMD image batch-size mismatch construction to avoid lossy casts. |
| llama-cpp-bindings/src/mtmd/mtmd_bitmap.rs | Adapts to upstream MTMD bitmap wrapper return type and new init flag; updates audio test fixture generation. |
| llama-cpp-bindings/src/mtmd/image_chunk_batch_size_mismatch.rs | Changes public mismatch field types (u32→usize/i32). |
| llama-cpp-bindings/src/model.rs | Adds cached chat parser handle + FFI status mapping; adds enable_thinking arg to apply_chat_template. |
| llama-cpp-bindings/src/llama_batch.rs | Removes Clippy suppression by using pointer casting helpers for llama_batch_get_one. |
| llama-cpp-bindings/src/lib.rs | Exposes new eval_multimodal_chunks_params module/type. |
| llama-cpp-bindings/src/eval_multimodal_chunks_params.rs | New params struct for multimodal evaluation calls. |
| llama-cpp-bindings/src/context/params.rs | Makes LlamaContextParams Copy and removes a Clippy suppression. |
| llama-cpp-bindings/src/context.rs | Removes per-fn Clippy suppression now that params are Copy and used by value. |
| llama-cpp-bindings/Cargo.toml | Allows literal_string_with_formatting_args crate-wide for tool-call fixtures. |
| llama-cpp-bindings-tests/tests/vocabulary_and_metadata.rs | Removes file-level Clippy expect; renames variables for clarity. |
| llama-cpp-bindings-tests/tests/sampling_and_constrained_decoding.rs | Updates throughput computation and apply_chat_template call signature. |
| llama-cpp-bindings-tests/tests/reasoning_markers_and_tool_calls.rs | Refactors a large test into helpers; updates apply_chat_template calls. |
| llama-cpp-bindings-tests/tests/multimodal_vision.rs | Updates to EvalMultimodalChunksParams-based multimodal eval API. |
| llama-cpp-bindings-tests/tests/multimodal_image_and_audio.rs | Adds helper to load fixture bitmaps; updates multimodal eval and expected output assertion. |
| llama-cpp-bindings-tests/tests/multimodal_audio.rs | Updates apply_chat_template and multimodal eval calls to new signatures. |
| llama-cpp-bindings-tests/tests/model_loading_errors.rs | Removes file-level Clippy expect. |
| llama-cpp-bindings-tests/tests/kv_cache_and_session.rs | Removes file-level Clippy expect. |
| llama-cpp-bindings-tests/tests/embedding_and_encoder.rs | Updates throughput calculation to avoid precision-loss lint. |
| llama-cpp-bindings-tests/tests/chat_template_and_message_parsing.rs | Updates apply_chat_template calls to include enable_thinking. |
| llama-cpp-bindings-tests/tests/backend_initialization.rs | Removes file-level Clippy expect. |
| llama-cpp-bindings-tests/src/build_user_prompt_with_media_marker.rs | Updates apply_chat_template call to include enable_thinking. |
| llama-cpp-bindings-tests/Cargo.toml | Reorders lint entries and allows unnecessary_wraps for harness-style trial fns. |
| llama-cpp-bindings-sys/wrapper_mtmd.cpp | Adapts to upstream mtmd_helper_bitmap_init_from_file(..., false) wrapper return type. |
| llama-cpp-bindings-sys/wrapper_chat_parse.h | Introduces chat parser handle API (create/free) and updates parse entrypoint to accept the parser handle. |
| llama-cpp-bindings-sys/wrapper_chat_parse.cpp | Implements chat parser caching primitives and PEG-native parsing flow. |
| llama-cpp-bindings-sys/wrapper_chat_apply.h | Extends chat-template application FFI with enable_thinking. |
| llama-cpp-bindings-sys/wrapper_chat_apply.cpp | Sets inputs.enable_thinking when applying chat templates. |
| llama-cpp-bindings-build/src/cmake_config.rs | Disables upstream app build; replaces runtime feature assertion with compile_error!. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ser creation leak
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Upgrades the vendored llama.cpp submodule to the official b9585 tag, adds a per-model chat-parser cache, and removes all clippy allow/expect suppressions.
🤖 Generated with Claude Code