[dependency] Bump flink to 1.20.5 and 2.1.3#842
Merged
Conversation
A Python vector store's query path runs numpy (e.g. chroma's embedding normalization). numpy releases and re-acquires the GIL during the conversion, which deadlocks on an async pemja worker thread since pemja keeps a single PyThreadState. Split the async RAG query so only the numpy normalization runs on the operator thread: embed and query stay async, normalize is sync. - PythonVectorStore: resolve the embedding model in open(); add embedQuery / normalizeEmbedding / queryNormalized hooks; forward pre-computed vectors. - BaseVectorStore: expose getEmbeddingModel to subclasses. - ContextRetrievalAction: async embed -> sync normalize -> async query. - ChromaVectorStore: add _normalize_embeddings; query accepts pre-normalized. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linked issue: #844 #692
Purpose of change
Commit 1 — [fix] Keep numpy off the async pool for cross-language RAG queries
Problem
A cross-language RAG query against a Python vector store (VectorStoreCrossLanguageTest) could hang intermittently in CI while passing locally. A Python store's query runs numpy (e.g. ChromaDB normalizes the query embedding via np.array). numpy releases and re-acquires the GIL during that copy, and pemja keeps a single PyThreadState; on the async executor's worker thread the re-acquire can stall, surfacing as a hang on few-core CI runners (machines with spare cores avoid the window). See #844.
Fix
Split the async RAG path so only the numpy step stays on the mailbox thread, leaving the rest async:
Commit 2 — [dependency] Bump flink to 1.20.5 and 2.1.3
Bumps pinned Flink versions: flink.1.20.version 1.20.4 → 1.20.5, flink.2.1.version 2.1.2 → 2.1.3, and aligns flink.2.0.version (2.0.2) and the e2e-integration matrix
accordingly across pom.xml, dist/pom.xml, and the integration e2e pom.
Tests
it & e2e
API
no
Documentation
doc-neededdoc-not-neededdoc-included