Skip to content

Bhamehta/code cleanup#2

Open
bmehta001 wants to merge 18 commits intomainfrom
bhamehta/code-cleanup
Open

Bhamehta/code cleanup#2
bmehta001 wants to merge 18 commits intomainfrom
bhamehta/code-cleanup

Conversation

@bmehta001
Copy link
Copy Markdown
Owner

@bmehta001 bmehta001 commented Mar 18, 2026

Fixes runtime thread-safety, shutdown-safety, and response-lifetime issues that affect normal SDK operation. Split out from microsoft#1415 per reviewer request so runtime behavior changes stay separate from CI/build/test fixes.

Fix HTTP cancellation and callback tracking

HttpClient_Apple.mm

  • Scope cancellation to the current request’s m_dataTask instead of canceling every task on the shared static NSURLSession.
  • Fix the CancelAllRequests() wait loop so m_requests.empty() is checked while holding m_requestsMtx.

HttpClientManager.cpp

  • Fix the cancelAllRequests() wait loop so m_httpCallbacks.empty() is checked while holding m_httpCallbacksMtx.

Fix HTTP response lifetime on abort/network-failure paths

HttpResponseDecoder.cpp

  • Preserve ctx->httpResponse through requestAborted(ctx) and temporaryNetworkFailure(ctx) so downstream storage/statistics handlers can still read status and headers.
  • Avoid leaking aborted/network-failure responses by keeping ownership with EventsUploadContext, whose clear() path deletes the response.

HttpResponseDecoderTests.cpp

  • Add regression coverage that aborted and network-failure decode routes still receive the response object with result/status intact.

Fix WorkerThread shutdown safety

WorkerThread.cpp

  • Add an explicit shutdown gate so late Queue() calls are rejected once teardown starts.
  • Enqueue the shutdown sentinel only once under the queue lock.
  • Clean up pending queued/timer tasks only after a successful join(), and avoid cleanup after detach() because the worker may still be running.
  • Log join/detach failures instead of silently swallowing all exceptions.

Make TransmissionPolicyManager scheduling consistently mutex-guarded

TransmissionPolicyManager.cpp / .hpp

  • Guard m_isUploadScheduled, m_runningLatency, and m_scheduledUploadTime consistently with m_scheduledUploadMutex.
  • Avoid holding m_scheduledUploadMutex across potentially blocking cancellation during stop/shutdown.
  • Keep force/zero-delay no-wait cancellation under the scheduler mutex so a competing delayed schedule cannot suppress an immediate upload.
  • Use an explicit std::chrono::milliseconds value in the bandwidth-controller reschedule path.

TransmissionPolicyManagerTests.cpp

  • Add regression coverage for the force/zero-delay scheduling race.

Fix Logger static-destruction-order crash

Logger.cpp

  • Remove destructor logging from Logger::~Logger() because it can run after logging infrastructure has already been destroyed, causing crashes during static teardown.

Validation

  • HttpResponseDecoderTests.* and TransmissionPolicyManagerTests.ForceScheduleRetainsImmediateUploadWhenCancelBlocks passed.
  • Full host UnitTests passed: 485/485

@bmehta001 bmehta001 self-assigned this Mar 18, 2026
@bmehta001 bmehta001 force-pushed the bhamehta/code-cleanup branch 20 times, most recently from 9a5a166 to 2feeb94 Compare March 19, 2026 05:40
@bmehta001 bmehta001 force-pushed the bhamehta/code-cleanup branch 4 times, most recently from e303041 to 9a91839 Compare April 1, 2026 16:55
hanselip and others added 5 commits April 8, 2026 17:40
…microsoft#1417)

* Fix JNI stale local ref in OfflineStorage_Room record iteration loops

GetAndReserveRecords, GetRecords, and ReleaseRecords each use a
pushLocalFrame/popLocalFrame scope per record. The jclass obtained from
GetObjectClass on the first iteration was stored as a plain local reference;
after popLocalFrame returns, that reference is freed by ART. On subsequent
iterations the C++ pointer is non-null, so the if(!record_class) guard is
skipped, but the jclass* now points into a freed local frame. ART's JNI
reference validity checker detects this and calls JniAbort, producing SIGABRT.

Fix: promote the jclass to a global reference via NewGlobalRef immediately
after GetObjectClass so it survives across popLocalFrame calls, and delete
it with DeleteGlobalRef after the loop.

Affected methods:
- GetAndReserveRecords: record_class
- ReleaseRecords: bt_class
- GetRecords: record_class

The bug was introduced in commit 873b562 (Jan 2021) when per-record
pushLocalFrame/popLocalFrame scoping was added. See GitHub issue microsoft#1227.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(OfflineStorage_Room): use RAII guard for JNI global refs

The initial fix (previous commit) correctly promoted jclass local refs to
global refs to survive pushLocalFrame/popLocalFrame, but placed
DeleteGlobalRef only after the loop in normal flow. Three paths leaked:

1. ThrowLogic throws std::logic_error — NOT caught by catch(runtime_error)
2. ThrowRuntime throws std::runtime_error — catch block never called Delete
3. consumer() throwing in GetAndReserveRecords also bypassed the delete

Fix: replace post-loop DeleteGlobalRef with a RAII guard struct whose
destructor calls DeleteGlobalRef unconditionally. Applied to all three
methods: GetAndReserveRecords, ReleaseRecords, and GetRecords.

Also:
- Initialize all jfieldID variables to nullptr (prevents use of
  indeterminate values if ThrowLogic fires mid field-ID setup)
- Check NewGlobalRef return value; throw std::runtime_error on null
  (caught by existing catch block)

Regression test improvements:
- GetRecords section: add order-independent set-based blob[0] check
  to verify field IDs cached on iteration 0 are valid on iterations 1+
- ReleaseRecords section: cycle GetAndReserveRecords + ReleaseRecords
  (incrementRetryCount=true) GetMaximumRetryCount()+1 times so the
  Room impl actually drops records and returns a non-empty byTenant
  array, exercising the bt_class loop. Without this, bt_class path
  was never entered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: consolidate GlobalRefGuard; fix build & test portability

OfflineStorage_Room.cpp:
- Move GlobalRefGuard struct into the anonymous namespace at the top of
  the file, replacing three identical inline local struct definitions.
  Eliminates duplication and makes a single authoritative definition.
- Change guard member from jclass& ref to jclass* ref_ptr for cleaner
  pointer semantics (more idiomatic RAII ownership pattern).

Tests:
- GetAndReserveRecords blob check: switch from index-based to set-based
  comparison since Memory storage returns records in LIFO order, not
  insertion order.
- GetRecords check: guard with if(!Memory) because Memory::GetRecords
  delegates to GetAndReserveRecords internally and returns nothing when
  records are already reserved.

Build fixes (macOS Apple Silicon):
- lib/CMakeLists.txt: use find_library to set IMPORTED_LOCATION on the
  sqlite3 IMPORTED GLOBAL target; without this cmake fails to generate
  on Apple Silicon where homebrew lives under /opt/homebrew.
- tests/unittests/CMakeLists.txt: add /opt/homebrew/opt/sqlite/lib path
  to the sqlite3 library lookup chain.
- lib/modules: update submodule to fix -Werror,-Wreorder-ctor build
  failure in Sanitizer.cpp (initializer list order mismatch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Hansel Ip <hanselip@microsoft.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…oad (microsoft#1422)

* disable built-in SDK stats events by default to reduce OneCollector load

* Fix BasicFuncTests
@bmehta001 bmehta001 force-pushed the bhamehta/code-cleanup branch from 9a91839 to 5a0701e Compare April 28, 2026 19:01
* Modernize CI workflows and fix iOS simulator selection

- Update Actions to latest versions (checkout@v4, setup-java@v5, etc.)
- Add concurrency groups scoped to PR events only (push builds on main
  always run to completion)
- Add 30-minute timeout to iOS CI to prevent deadlock hangs
- Fix iOS simulator UUID resolution for unambiguous destination
- Fix Android SDK path detection in CodeQL workflow
- Document python3 requirement for iOS tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix compiler flags and scope -Wno-reorder

- Split GCC/Clang warning flags (Clang-only flags were breaking GCC)
- Add -fno-finite-math-only to restore IEEE 754 compliance needed by
  sqlite3 when -ffast-math is enabled
- Remove global -Wno-reorder, scope to Sanitizer.cpp only (submodule
  declares members in wrong order)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Improve iOS test script and documentation

- Resolve simulator UUID for unambiguous xcodebuild destination
- Add python3 dependency documentation
- Update iOS getting-started docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix flaky functional tests for iOS simulator CI

- Use unique phase2 DB paths instead of sleep to avoid SQLite vnode
  collisions from deferred sqlite3_close_v2 cleanup
- Clean SQLite journal files (-wal, -shm, -journal) in CleanStorage
- Fix cross-test config contamination (reset storage settings)
- Use sleep(10) instead of yield() in waitForEvents to reduce CPU
  contention on single-core simulator runners
- Increase timeouts for iOS simulator environment
- Fix multi-batch upload flakiness with SetTransmitProfile(RealTime)
- Use 127.0.0.1 instead of localhost for local test server

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 force-pushed the bhamehta/code-cleanup branch 6 times, most recently from dc4da4f to 83ac9bd Compare April 29, 2026 21:31
bmehta001 and others added 2 commits April 29, 2026 16:39
* Remove private in6 header includes

Remove the dead netinet6/in6.h imports from ODWReachability and its Objective-C unit test so Apple builds no longer fail on the private-header error from main.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove dead reachability imports

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore explicit reachability socket headers

Add back the canonical sys/socket.h and netinet/in.h includes in ODWReachability.m and ODWReachabilityTests.mm while keeping the other dead imports removed. This avoids relying on transitive Apple headers in the reachability PR review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Modernize CMakeLists.txt: flatten nesting, deduplicate, use
  consistent quoting and variable patterns
- Remove stale header references from vcxproj and vcxitems files
- Simplify test CMakeLists.txt files
- Fix CMake conventions in packaging scripts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 force-pushed the bhamehta/code-cleanup branch from 65fc991 to 0fe40ec Compare April 29, 2026 21:42
bmehta001 and others added 9 commits April 30, 2026 06:57
Keep the before.targets toolset selection deterministic on newer
Visual Studio hosts, but only set the MIP props fallback when a
consumer has not already chosen a toolset. While addressing the
MSBuild review comments, also point the optional module test
conditions and source paths at the real lib/modules locations and
use CPACK_PACKAGE_FILE_NAME for the RPM status message.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop the bootstrap script from pulling LLVM through Chocolatey on every run so existing LLVM installations and non-clang setups are not modified unexpectedly.

Files changed:
- tools/setup-buildtools.cmd

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ReadTheDocs needs an explicit build image and system packages for the Doxygen-backed Sphinx site. The Sphinx config also referenced a theme that was not installed, so use the existing Furo dependency and pin Sphinx below the next incompatible major version.

Files changed:
- .readthedocs.yaml
- docs/public/conf.py
- docs/public/requirements.txt

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants