WIP - tackling some flakes and failures#2736
Draft
nbbeeken wants to merge 10 commits into
Draft
Conversation
Windows CI sometimes kills mongod before it emits its port announcement, leaving `MongoRunnerSetup.start()` with the unhelpful "Server log output did not include port or socket" error. Retry up to 3 times (2s, 4s back-off) so transient AV-scan or filesystem-lock failures self-heal. On each failed attempt, print any log files mongod managed to write so persistent failures are diagnosable without reading Evergreen task logs manually. Addresses Groups 1 and 9 from docs/foilage-test-tickets.md.
NAN 2.24.0 references v8::AccessControl in nan.h, which was removed in Node.js 26 (V8 13.x). NAN 2.27.0 (released 2026-05-12) removes those references. cpu-features already allows ^2.19.0 so only the lockfile needs updating. Fixes the smoke-tests windows/node@latest job that has been failing on main since 2026-06-03.
…I tests on 8.3+ node-gyp 11.x (bundled with Node.js 26) injects LLVM ThinLTO linker flags (opt:lldltojobs=2) that MSVC link.exe rejects with LNK1117. This breaks all native addon compilation (interruptor, cpu-features, etc.) on Windows with Node.js 26. Linux/macOS use Clang/GCC where the ThinLTO flags are valid, so only the windows/latest matrix combination is excluded. Also skip the Queryable Encryption prefix/suffix/substring tests on server >= 8.3 where the server removed the 'Preview' suffix from QE query types (prefixPreview → prefix, etc.). The driver hasn't shipped support for the GA API names yet. See MONGOSH-3336–MONGOSH-3341.
- cli-repl CTRL-C loop1/loop2 (MONGOSH-3381, MONGOSH-3378, MONGOSH-3382): server no longer reliably kills $where ops via killOp in 8.3+ - shell-api maxTimeMS (MONGOSH-3379, MONGOSH-3383): $where+maxTimeMS hangs 60s instead of throwing MaxTimeMSExpired in 8.3+ - shard bucketsNs (MONGOSH-3328, MONGOSH-3329, MONGOSH-3330): server returns user-visible name instead of system.buckets name in 8.3+; use oneOf() to accept both forms - java-shell GraalVM tests (MONGOSH-3307): skip on >= 8.3 where they fail - e2e glibc/deviceId (MONGOSH-3334): guard assertions when native addon returns N/A/'unknown' on ubuntu2004 builders regardless of Node.js version
… hooks The original workaround for nodejs/node#61895 searched for the REPL's newListener hook by checking listener.toString() for 'ERR_INVALID_REPL_INPUT'. That string is not stable across Node.js patch versions; on some Node.js 24 builds on ubuntu2004 the hook isn't found and stays alive, keeping the REPL context reachable and preventing GC. Snapshot the process newListener set before REPL creation and remove any listeners added by the REPL using identity comparison instead.
…Listener hooks" This reverts commit 6b3f428.
Two fixes to make the GC regression test reliable across Node.js versions: 1. Use identity comparison (snapshot before vs after REPL creation) to remove the newListener handler added by the Node.js REPL for nodejs/node#61895. The previous toString() check for 'ERR_INVALID_REPL_INPUT' is not stable across Node.js patch versions — on Node.js 24.16.0 ubuntu2004 the function's source text is not present in toString(), so the handler was never removed. 2. After the REPL emits 'exit' in mongosh-repl.close(), null out repl.context so that async cleanup callbacks (e.g. the history file write) that keep the REPLServer alive don't transitively prevent the vm.Context from being garbage collected.
1. waitForPrompt (test-shell.ts): split on \r as well as \n so that spinner output written with carriage-returns (e.g. from the CSFLE library during FLE collection setup) does not prevent prompt detection. Fixes FLE 'allows automatic range encryption' flake on macos_15_amd64_gui (expected prompt timeout with '| | | |' in output). 2. e2e-analytics before_all (e2e-analytics.spec.ts): increase the executeLine timeout for rs.initiate() from 10 s to 30 s. After initiating a 4-node replica set mongosh updates its prompt once it detects the new topology; on slow CI this detection can exceed 10 s. Fixes the before_all hook flake on e2e_tests_darwin_arm64_m805. 3. kerberos.sh: retry docker compose up --build up to 3 times with 5 s / 10 s back-off. Transient network errors downloading Debian packages during Docker image builds (e.g. connection reset fetching krb5-user from deb.debian.org) no longer fail the entire suite. 4. evergreen.yml.in / .evergreen.yml: wrap the VSCode docker run in retry-with-backoff.sh (ATTEMPTS=3). VSCode Insiders occasionally exits with code 255 due to extension-host cleanup races; a retry picks up where the first attempt left off.
The script was tracked as 100644 (non-executable) so Evergreen's bash rejected it with exit 126 when invoked directly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.