feat(query): add multi_match query support by poyrazK · Pull Request #110 · poyrazK/cloudSearch

poyrazK · 2026-05-22T11:39:05Z

Summary

Add MultiMatchType enum (BestFields | MostFields | Phrase | PhrasePrefix) and MultiMatchQuery struct to cloudsearch-common
Add SearchQuery::MultiMatch(MultiMatchQuery) variant
Implement score_multi_match_query with best_fields (max + tie_breaker × sum), most_fields (sum), and phrase (consecutive token match) scoring
Add HTTP API parsing for multi_match clause with fields as BTreeMap<String, f32>
Add 2 test cases: multi_match_best_fields_returns_max_score and multi_match_most_fields_sums_scores

Test plan

cargo test --workspace — all pass (402 tests)
cargo clippy --workspace -- -D warnings -D clippy::pedantic — clean
New coverage tests pass

JSON API usage

{
  "multi_match": {
    "query": "brown fox",
    "fields": {"title": 2.0, "content": 1.0},
    "type": "best_fields",
    "tie_breaker": 0.3
  }
}

Summary by CodeRabbit

Release Notes

New Features
- Added support for multi-match queries, enabling simultaneous searches across multiple fields with configurable per-field weights.
- Multi-match queries support different matching strategies (BestFields, MostFields, phrase, phrase_prefix) and optional tie-breaker configuration for results scoring.
Tests
- Added comprehensive test coverage for multi-match query scoring behavior across different field combinations and match types.

Add MultiMatchQuery with BestFields, MostFields, Phrase, and PhrasePrefix types. Searches a query string across multiple fields with optional field weights. Tie-breaker formula implemented for best_fields type. - Add MultiMatchType enum and MultiMatchQuery struct to cloudsearch-common - Add SearchQuery::MultiMatch variant - Implement score_multi_match_query with best_fields/most_fields/phrase types - Add HTTP API parsing for multi_match clause with fields as object - Add test coverage for best_fields and most_fields scoring behavior Closes: #84

coderabbitai · 2026-05-22T11:40:02Z

Warning

Review limit reached

@poyrazK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 39 minutes and 20 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e23d5fa8-26cc-4dac-baee-f056797bfe31

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd2e48 and c7fda27.

📒 Files selected for processing (3)

rust/crates/cloudsearch-api/src/lib.rs
rust/crates/cloudsearch-index/src/lib.rs
rust/crates/cloudsearch-index/tests/coverage.rs

📝 Walkthrough

Walkthrough

This PR extends the search engine to support Elasticsearch-style multi_match queries. It adds type definitions for multi-match configuration, JSON parsing logic in the API layer, validation and term extraction in the index layer, and scoring algorithms that compute weighted BM25 scores across multiple fields with support for BestFields and MostFields modes.

Changes

Multi-Match Query Implementation

Layer / File(s)	Summary
Schema types and trait updates `rust/crates/cloudsearch-common/src/lib.rs`	Introduces `MultiMatchType` enum (BestFields/MostFields/Phrase/PhrasePrefix) and `MultiMatchQuery` struct with per-field weights and configuration. Updates derives to remove `Eq` from SearchRequest, SearchQuery, BoolQuery, and MultiSearchRequest.
API parsing for multi-match clauses `rust/crates/cloudsearch-api/src/lib.rs`	Extends search query parser to recognize `multi_match` JSON clauses, validate required `query` and `fields` properties, convert per-field weights to BTreeMap, apply optional `type` and `tie_breaker` defaults, and construct SearchQuery::MultiMatch.
Request validation and query-term extraction `rust/crates/cloudsearch-index/src/lib.rs`	Extends validate_query to check that all multi-match target fields are text fields; extends extract_query_terms and get_query_terms to tokenize the multi-match query string for scoring and highlighting.
Multi-match BM25 scoring `rust/crates/cloudsearch-index/src/lib.rs`	Adds score_multi_match_query dispatcher and helpers to compute per-field BM25 match scores with optional phrase verification, apply multi-match type strategies (BestFields selects maximum, MostFields sums weighted scores), and route phrase modes through phrase matching.
Multi-match coverage tests `rust/crates/cloudsearch-index/tests/coverage.rs`	Adds two test cases verifying BestFields mode returns the maximum single-field score and MostFields mode produces higher scores than BestFields when matching across multiple fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

poyrazK/cloudSearch#95: Adds the foundational BM25-based scoring pipeline that this PR extends with multi-field score aggregation and phrase matching support.

Poem

🐰 A bunny bounces through the fields,
Each weighted query now yields,
Best or most fields, phrase or match,
Elasticsearch queries hatch!
Multi-search dreams come true at last, 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: adding multi_match query support, which aligns with the primary objective and all major file changes in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/multi-match-query

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

rust/crates/cloudsearch-index/tests/coverage.rs (1)
669-709: ⚡ Quick win

Make these tests assert the scoring contract more directly.

Both cases are permissive enough that a cross-field leakage bug still passes: the first test never proves best_fields equals the best single-field score, and the second only proves most_fields > best_fields. Please compare against equivalent single-field queries and add a negative case like {"title":"foo","content":"bar"} with query "foo bar" so each field is forced to contribute only its own tokens. A type: phrase case that rejects non-consecutive tokens would also be valuable here.

Also applies to: 712-769
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/cloudsearch-index/tests/coverage.rs` around lines 669 - 709,
Update multi_match_best_fields_returns_max_score to assert scoring against
equivalent single-field queries: after the MultiMatch search (MultiMatchQuery
with MultiMatchType::BestFields and tie_breaker 0.0), run two single-field
SearchRequest queries (one targeting only "title" and one only "content") and
capture their hit scores, then assert that the multi_match score equals the max
of those single-field scores (use the same doc id and ensure score.is_some()).
Also add a negative/phrase case: index an additional document with both tokens
in the same field and/or run a MultiMatch with type=Phrase to assert that
non-consecutive token matches do not inflate BestFields (i.e., phrase query
should reject non-consecutive tokens and the score should differ), and ensure to
reference functions/types from the test
(multi_match_best_fields_returns_max_score, SearchRequest,
SearchQuery::MultiMatch, MultiMatchQuery, MultiMatchType::BestFields) when
locating where to add these assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rust/crates/cloudsearch-api/src/lib.rs`:
- Around line 1008-1057: The parse_multi_match_query function is failing rustfmt
checks; run rustfmt (or `cargo fmt --all`) and reformat this file so CI
passes—ensure the block around parse_multi_match_query (including variable
declarations like parsed_fields, the for loop over fields, the multi_match_type
match, and the tie_breaker mapping) is reformatted to follow rustfmt rules and
then commit the formatted file.
- Around line 1029-1032: The code currently silently defaults non-numeric field
weights to 1.0 (using weight_val.as_f64().map_or(1.0,...)), which should instead
fail fast; locate the loop iterating over fields (symbols: fields, weight_val,
parsed_fields) and change the logic to validate weight_val is numeric and
convertible to f32, returning an error (or propagating one) when
weight_val.as_f64() is None or conversion is invalid, rather than inserting a
default 1.0 into parsed_fields.
- Around line 1035-1043: The parsing currently silently maps unknown
multi_match.type strings to MultiMatchType::BestFields; change the
object.get("type") parsing (the multi_match_type assignment) to return an error
instead of defaulting — i.e., make the match arm produce a
Result<MultiMatchType, Error> (or Option with validation) and if the string is
not one of "most_fields", "phrase", "phrase_prefix", or "best_fields" return a
descriptive parsing error (propagate via ? or convert to the function's Result),
update the surrounding function signature to return Result if necessary, and
ensure callers of the function handle the error; reference the multi_match_type
variable and the MultiMatchType enum when implementing this validation so
unknown types are rejected rather than defaulted.

In `@rust/crates/cloudsearch-common/src/lib.rs`:
- Line 308: The MultiMatchQuery struct currently uses the field multi_match_type
which serializes to "multi_match_type" but the API expects the key "type";
update the struct to rename that field for Serde JSON to "type" (e.g. add
#[serde(rename = "type")] to the multi_match_type field) so MultiMatchQuery
payloads match the single-search JSON shape; ensure the field name remains
multi_match_type and that the MultiMatchType type is unchanged so both serialize
and deserialize correctly for MultiSearchRequest and single-search payloads.

In `@rust/crates/cloudsearch-index/src/lib.rs`:
- Around line 2072-2115: score_match_query_for_field currently counts term
frequency from postings across all fields (via positions_readers), inflating TF
for fields that don't actually contain the term; fix by scoping TF to the
current field: when computing tf for a token, first try to get the term
frequency from the positions reader that corresponds to the given field (or, if
positions_readers are not field-scoped, skip using them and derive tf solely
from field_tokens by counting occurrences in field_tokens), then fall back to
the existing field_tokens.count() only if the field-scoped postings are absent;
update the tf accumulation logic in score_match_query_for_field (refer to
variables positions_readers, field_tokens, tf and the call to
bm25_ctx.bm25_term_score) so TF reflects occurrences in this specific field
before computing term_score.
- Around line 2049-2066: The MultiMatchType::Phrase branch currently falls back
to best_fields when score_phrase_for_fields returns None; change it so that if
score_phrase_for_fields(document, &query_tokens, &query.fields, doc_id,
positions_readers, bm25_ctx) returns None the branch returns None immediately
(do not compute or return field_scores, max, or tie_breaker-based sums). In
short, in the MultiMatchType::Phrase arm, only return Some(ps) when
score_phrase_for_fields yields Some, otherwise return None to ensure phrase-only
matching.
- Around line 2134-2136: The loop over (field, weight) uses
document.source.get(field)?.as_str()? which returns None and aborts phrase
scoring when any field is missing or non-string; change this to skip such fields
by checking get(field) and as_str() with if let (or match) so only fields that
yield a &str are tokenized and scored (i.e., replace the `?` chaining with a
safe conditional that continues the loop on missing/non-string values), keeping
the rest of the logic that calls tokenize(field_str) and accumulates scores for
valid fields.

---

Nitpick comments:
In `@rust/crates/cloudsearch-index/tests/coverage.rs`:
- Around line 669-709: Update multi_match_best_fields_returns_max_score to
assert scoring against equivalent single-field queries: after the MultiMatch
search (MultiMatchQuery with MultiMatchType::BestFields and tie_breaker 0.0),
run two single-field SearchRequest queries (one targeting only "title" and one
only "content") and capture their hit scores, then assert that the multi_match
score equals the max of those single-field scores (use the same doc id and
ensure score.is_some()). Also add a negative/phrase case: index an additional
document with both tokens in the same field and/or run a MultiMatch with
type=Phrase to assert that non-consecutive token matches do not inflate
BestFields (i.e., phrase query should reject non-consecutive tokens and the
score should differ), and ensure to reference functions/types from the test
(multi_match_best_fields_returns_max_score, SearchRequest,
SearchQuery::MultiMatch, MultiMatchQuery, MultiMatchType::BestFields) when
locating where to add these assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b922f19-9361-4eaa-8f4b-e163f189ddea

📥 Commits

Reviewing files that changed from the base of the PR and between 1c31a0f and 2dd2e48.

📒 Files selected for processing (4)

rust/crates/cloudsearch-api/src/lib.rs
rust/crates/cloudsearch-common/src/lib.rs
rust/crates/cloudsearch-index/src/lib.rs
rust/crates/cloudsearch-index/tests/coverage.rs

coderabbitai · 2026-05-22T11:44:17Z

+    let multi_match_type = object
+        .get("type")
+        .and_then(Value::as_str)
+        .map_or(MultiMatchType::BestFields, |s| match s {
+            "most_fields" => MultiMatchType::MostFields,
+            "phrase" => MultiMatchType::Phrase,
+            "phrase_prefix" => MultiMatchType::PhrasePrefix,
+            _ => MultiMatchType::BestFields,
+        });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown multi_match.type instead of silently falling back

Line 1038 currently maps any unknown type to BestFields, which hides user mistakes and changes query semantics without an error.

Proposed fix

- let multi_match_type = object - .get("type") - .and_then(Value::as_str) - .map_or(MultiMatchType::BestFields, |s| match s { - "most_fields" => MultiMatchType::MostFields, - "phrase" => MultiMatchType::Phrase, - "phrase_prefix" => MultiMatchType::PhrasePrefix, - _ => MultiMatchType::BestFields, - }); + let multi_match_type = match object.get("type").and_then(Value::as_str) { + None => MultiMatchType::BestFields, + Some("best_fields") => MultiMatchType::BestFields, + Some("most_fields") => MultiMatchType::MostFields, + Some("phrase") => MultiMatchType::Phrase, + Some("phrase_prefix") => MultiMatchType::PhrasePrefix, + Some(other) => { + return Err(ApiError(CloudSearchError::InvalidSearchRequest(format!( + "unsupported multi_match type '{other}'" + )))); + } + };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let multi_match_type = object

.get("type")

.and_then(Value::as_str)

.map_or(MultiMatchType::BestFields, |s| match s {

"most_fields" => MultiMatchType::MostFields,

"phrase" => MultiMatchType::Phrase,

"phrase_prefix" => MultiMatchType::PhrasePrefix,

_ => MultiMatchType::BestFields,

});

let multi_match_type = match object.get("type").and_then(Value::as_str) {

None => MultiMatchType::BestFields,

Some("best_fields") => MultiMatchType::BestFields,

Some("most_fields") => MultiMatchType::MostFields,

Some("phrase") => MultiMatchType::Phrase,

Some("phrase_prefix") => MultiMatchType::PhrasePrefix,

Some(other) => {

return Err(ApiError(CloudSearchError::InvalidSearchRequest(format!(

"unsupported multi_match type '{other}'"

))));

}

};

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust/crates/cloudsearch-api/src/lib.rs` around lines 1035 - 1043, The parsing currently silently maps unknown multi_match.type strings to MultiMatchType::BestFields; change the object.get("type") parsing (the multi_match_type assignment) to return an error instead of defaulting — i.e., make the match arm produce a Result<MultiMatchType, Error> (or Option with validation) and if the string is not one of "most_fields", "phrase", "phrase_prefix", or "best_fields" return a descriptive parsing error (propagate via ? or convert to the function's Result), update the surrounding function signature to return Result if necessary, and ensure callers of the function handle the error; reference the multi_match_type variable and the MultiMatchType enum when implementing this validation so unknown types are rejected rather than defaulted.

coderabbitai · 2026-05-22T11:44:17Z

+    #[serde(default)]
+    pub fields: BTreeMap<String, f32>,
+    #[serde(default)]
+    pub multi_match_type: MultiMatchType,


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

MultiMatchQuery uses a different wire key than the HTTP parser expects

Line 308 serializes/deserializes as multi_match_type, but the API parser accepts "type" for multi_match. This makes typed MultiSearchRequest payloads inconsistent with single-search JSON shape.

Proposed fix

pub struct MultiMatchQuery { pub query: String, #[serde(default)] pub fields: BTreeMap<String, f32>, - #[serde(default)] + #[serde(default, rename = "type", alias = "multi_match_type")] pub multi_match_type: MultiMatchType, #[serde(default)] pub tie_breaker: f32, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub multi_match_type: MultiMatchType,

pub struct MultiMatchQuery {

pub query: String,

#[serde(default)]

pub fields: BTreeMap<String, f32>,

#[serde(default, rename = "type", alias = "multi_match_type")]

pub multi_match_type: MultiMatchType,

#[serde(default)]

pub tie_breaker: f32,

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust/crates/cloudsearch-common/src/lib.rs` at line 308, The MultiMatchQuery struct currently uses the field multi_match_type which serializes to "multi_match_type" but the API expects the key "type"; update the struct to rename that field for Serde JSON to "type" (e.g. add #[serde(rename = "type")] to the multi_match_type field) so MultiMatchQuery payloads match the single-search JSON shape; ensure the field name remains multi_match_type and that the MultiMatchType type is unchanged so both serialize and deserialize correctly for MultiSearchRequest and single-search payloads.

…king, add strict phrase semantics - Add score_phrase_prefix_for_fields for phrase_prefix type with prefix matching on last token while requiring consecutive tokens for the rest - Fix score_phrase_for_fields: positions are byte offsets, not token indices, so consecutive check must use prev_pos + len(prev_token) algorithm - Add multi_match_phrase_requires_consecutive_tokens test - Add multi_match_phrase_prefix_with_prefix_last_token test - Add multi_match_tie_breaker_formula test - Add empty fields validation in validate_query - Strict phrase semantics: no fallback to best_fields when phrase doesn't match - Add #[allow(clippy::too_many_lines)] to score_phrase_prefix_for_fields

Extract tokens_are_consecutive and score_phrase_in_field helpers to reduce nesting and eliminate spaghetti control flow in phrase matching. Also fix clippy issues: explicit_iter_loop, question_mark, and add #[allow(clippy::cast_possible_truncation)] where needed.

…r_fields

poyrazK

It's okay to merge

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

poyrazK force-pushed the feature/multi-match-query branch from 948d402 to cb2552b Compare May 24, 2026 12:22

poyrazK force-pushed the feature/multi-match-query branch from cb2552b to bbbdf1e Compare May 24, 2026 12:29

poyrazK added 3 commits May 24, 2026 15:37

refactor(index): extract helper functions from score_phrase_prefix_fo…

368e57e

…r_fields

style(index): apply cargo fmt to refactored code

c7fda27

poyrazK commented May 24, 2026

View reviewed changes

poyrazK merged commit 9f7775b into main May 24, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(query): add multi_match query support#110

feat(query): add multi_match query support#110
poyrazK merged 5 commits into
mainfrom
feature/multi-match-query

poyrazK commented May 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 22, 2026

Uh oh!

coderabbitai Bot May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poyrazK left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    let multi_match_type = object
-        .get("type")
-        .and_then(Value::as_str)
-        .map_or(MultiMatchType::BestFields, |s| match s {
-            "most_fields" => MultiMatchType::MostFields,
-            "phrase" => MultiMatchType::Phrase,
-            "phrase_prefix" => MultiMatchType::PhrasePrefix,
-            _ => MultiMatchType::BestFields,
-        });
+    let multi_match_type = match object.get("type").and_then(Value::as_str) {
+        None => MultiMatchType::BestFields,
+        Some("best_fields") => MultiMatchType::BestFields,
+        Some("most_fields") => MultiMatchType::MostFields,
+        Some("phrase") => MultiMatchType::Phrase,
+        Some("phrase_prefix") => MultiMatchType::PhrasePrefix,
+        Some(other) => {
+            return Err(ApiError(CloudSearchError::InvalidSearchRequest(format!(
+                "unsupported multi_match type '{other}'"
+            ))));
+        }
+    };

Conversation

poyrazK commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

JSON API usage

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poyrazK left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poyrazK commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading