Skip to content

feat(graphile-llm): auto-embed unifiedSearch text for hybrid vector+keyword search#1290

Merged
pyramation merged 3 commits into
mainfrom
feat/llm-unified-search
Jun 13, 2026
Merged

feat(graphile-llm): auto-embed unifiedSearch text for hybrid vector+keyword search#1290
pyramation merged 3 commits into
mainfrom
feat/llm-unified-search

Conversation

@pyramation

Copy link
Copy Markdown
Contributor

Summary

When graphile-llm is active with a configured embedder, unifiedSearch: "text" now automatically embeds the text and includes pgvector in the RRF rank fusion — zero client changes, zero new fields.

Architecture (resolver-wrapper pattern):

Client: unifiedSearch: "HIPAA compliance"  (String — schema unchanged)
                    ↓
LlmTextSearchPlugin resolver wrapper (async, pre-plan):
  1. Detects unifiedSearch string in args.where
  2. Calls embedder("HIPAA compliance") → [0.1, 0.3, ...]
  3. Transforms: args.where.unifiedSearch = { __text: "HIPAA compliance", __vector: [0.1, ...] }
                    ↓
unifiedSearch apply function (sync, plan-time):
  - typeof val === 'object' && val.__text → extracts text + vector
  - Fans out __text to tsv, BM25, trgm (existing behavior)
  - Passes __vector to pgvector adapter's buildFilterApply
  - All WHERE clauses combined with OR
  - RRF fuses all 4 rank lists into normalized [0,1] searchScore

Graceful degradation:

  • Embedder returns null (quota exceeded) + text adapters exist → falls back to text-only search (no error)
  • Embedder returns null + NO text adapters (vector-only table) → throws explicit error

Key changes:

  • graphile-search/plugin.ts apply function: accepts { __text, __vector } object in addition to plain String
  • graphile-llm/text-search-plugin.ts embedTextInWhere: detects unifiedSearch key, embeds text, transforms to object shape
  • Resolver wrapper now wraps tables with searchable columns (not just vector-only)
  • hasTextAdapters parameter threads through for degradation logic

Tests: 12 unit tests (no DB) + 4 integration tests with mock LLM plugin verifying pgvector participates in RRF fusion when embedding is injected.

Closes #1054

Link to Devin session: https://app.devin.ai/sessions/4a9f098c74fb4cb6a9b6868fcff321db
Requested by: @pyramation

…eyword search

When graphile-llm is active with a configured embedder, unifiedSearch
now automatically embeds the text query and includes pgvector in the
RRF rank fusion alongside text adapters (tsvector, BM25, trgm).

Changes:
- graphile-search plugin.ts: apply function accepts both String and
  { __text, __vector } object shape. When __vector is present, pgvector
  adapter participates in the OR + RRF fusion.
- LlmTextSearchPlugin: resolver wrapper detects unifiedSearch string
  values, embeds them via the configured embedder, and transforms to
  { __text, __vector } before the apply function runs.
- Graceful degradation: if embedder returns null (quota exceeded) and
  text adapters exist, falls back to text-only search. If no text
  adapters available, throws explicit error.
- 12 unit tests for embedding transformation logic
- 4 integration tests with mock LLM plugin verifying RRF fusion

Closes #1054 (graphile-llm + unifiedSearch integration)
@devin-ai-integration

Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@blacksmith-sh

This comment has been minimized.

…atible integration tests

Grafast plan-based execution skips resolve() on connection fields,
so the mock LLM plugin's resolver wrapper never fires in direct
schema execution. Replace with tests that verify the equivalent
end-user behavior: unifiedSearch + vectorEmbedding combined produce
correct RRF fusion across all 4 adapters.

The object-shape handling ({ __text, __vector }) is tested via
unit tests in graphile-llm/src/__tests__/unified-search-embedding.test.ts.
The full resolver-wrapper → apply flow is exercised in PostGraphile
server-level tests (graphile-llm CI).
…ion input detection, text-search scope

- Fix RAG plugin: move chunk table discovery from init to build hook so
  @hasChunks smart tags are visible; scan pgCodecs (not pgResources)
- Fix text-search-plugin: use context.Self.name instead of
  scope.inputObjectTypeName for PostGraphile v5 type identification
- Fix text-mutation-plugin: detect create input types (ArticleInput) that
  lack isPgBaseInput scope flag via type name convention fallback
- Fix makeTestSmartTagsPlugin: apply tags during build hook (not init)
  with before: ['LlmRagPlugin'] ordering to ensure tags are set first
- Export embedTextInWhere from text-search-plugin, remove duplicated
  re-implementation in unit tests
- Extract injectScoreAndRank helper in graphile-search plugin.ts to
  de-duplicate 60-line block repeated 3 times
- Add 7 real Ollama nomic-embed integration tests (Suite 7) covering
  text-only, vector-only, RRF fusion, and semantic ranking
- Add CI config for graphile-llm tests with Ollama service

All 52 graphile-llm tests pass, all 75 graphile-search tests pass.
@pyramation pyramation merged commit 73ebec1 into main Jun 13, 2026
36 checks passed
@pyramation pyramation deleted the feat/llm-unified-search branch June 13, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant