chore: improve reliability and backup handling#108
Conversation
📝 WalkthroughWalkthroughThis PR adds bounded HTTP and file handling in libcore and Android callers, introduces versioned backup serialization with WebDAV log redaction, adds notification permission checks and subscription scheduling changes, and updates workflow cache keys plus APK build automation. ChangesLibcore hardening and Android callers
Backup format v2 and WebDAV logging
Notification guards and UI fixes
CI and workflow cache updates
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
libcore/http.go (1)
133-144: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winBound the SOCKS handshake and close failed sockets.
defaultHTTPDialTimeoutonly covers the TCP dial to the local SOCKS port.socks.ClientHandshake5can still stall on that connection unless a deadline is set, and failed handshakes should closesocksConnbefore falling back.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libcore/http.go` around lines 133 - 144, The SOCKS proxy path in the http transport setup still leaves `socks.ClientHandshake5` unbounded and can leak a failed `socksConn` before retrying. In `libcore/http.go`, update the `c.h1h2Transport.DialContext` logic to set a deadline or use the existing `ctx` timeout around the handshake, and ensure any handshake error closes `socksConn` before the fallback/continue path. Keep the fix localized to the `DialContext` closure and preserve the existing `tryH3Direct` behavior.
🧹 Nitpick comments (2)
buildScript/lib/naive.sh (1)
35-43: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win
exit 1for an unsupported ABI won't propagate from the command substitution.
expected_sha256_for_abiis invoked as"$(expected_sha256_for_abi "$abi")"(Line 61), so its*) ... exit 1(Line 41) only terminates the subshell. The parent continues with an empty expected value;verify_sha256then reports a misleading "checksum mismatch ... expected , got " instead of "unsupported ABI". The build still halts, but the diagnostic is wrong.Resolve the expected checksum into a variable first so the failure surfaces directly:
♻️ Suggested fix
echo ">> fetching libnaive.so for $abi ($apk)" curl -fL --retry 3 --retry-delay 2 --max-time 300 "$BASE/$apk" -o "$apk_path" - verify_sha256 "$apk_path" "$(expected_sha256_for_abi "$abi")" + local expected + expected="$(expected_sha256_for_abi "$abi")" || exit 1 + verify_sha256 "$apk_path" "$expected"Also applies to: 55-66
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@buildScript/lib/naive.sh` around lines 35 - 43, The unsupported-ABI failure in expected_sha256_for_abi is getting swallowed by the command substitution used at the verify_sha256 call site, so the script reports a checksum mismatch instead of the real error. Update the flow around expected_sha256_for_abi and verify_sha256 in naive.sh to resolve the checksum into a variable before calling verify_sha256, and explicitly detect the unsupported ABI case so the script exits with the correct diagnostic rather than continuing with an empty expected value.app/src/main/java/io/nekohasekai/sagernet/ui/BackupFragment.kt (1)
690-724: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a regression test for both import versions.
This branch is now the compatibility boundary, but the new coverage only exercises
BackupFormatV2helpers andbackupFileName()(app/src/test/java/io/nekohasekai/sagernet/ui/BackupFormatV2Test.kt, Lines 22-193;app/src/test/java/io/nekohasekai/sagernet/ui/BackupFileNameTest.kt, Lines 10-26). A smallfinishImport()test for one v1 payload and one v2 payload would catch dispatch regressions before the destructive reset path runs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/main/java/io/nekohasekai/sagernet/ui/BackupFragment.kt` around lines 690 - 724, Add regression coverage in the import flow to verify both backup formats are dispatched correctly before the destructive reset path. Extend the tests around BackupFragment.finishImport() so one case imports a v1 payload and another imports a BackupFormatV2.VERSION payload, asserting the correct decode path is used for profiles/groups/rules/settings. This should complement the existing BackupFormatV2Test and BackupFileNameTest coverage and protect the version switch logic in finishImport.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.depot/workflows/build-apk.yml:
- Around line 42-64: The cache fingerprint logic for libcore and sidecars only
hashes file contents, so renames/moves or same-content swaps can reuse stale
artifacts. Update the status-generation steps in build-apk.yml so the
fingerprints also incorporate each file’s path alongside its content, then keep
the existing cache keys in libcore-cache and the sidecar cache aligned with the
new status files. Use the existing libcore status and sidecars status steps as
the place to make this change.
In `@app/src/main/java/io/nekohasekai/sagernet/bg/SubscriptionUpdater.kt`:
- Around line 113-117: In SubscriptionUpdater.computeSubscriptions(), switch the
due-check window to Long arithmetic instead of Int. Use a Long value for the
current epoch seconds and ensure the auto-update delay is converted and
multiplied as Long before comparing against subscription.lastUpdated, so the
check cannot overflow or flip for large timestamps or delays. Keep the logic in
sync with computeSubscriptionWorkSchedule(), which already uses Long.
In `@app/src/main/java/io/nekohasekai/sagernet/ui/BackupFragment.kt`:
- Around line 374-381: In the WebDAV restore flow inside BackupFragment’s
PROPFIND handling, only proceed to parse the multistatus body when the response
is exactly 207. Tighten the success check around the existing response
validation and body parsing so that any other 2xx responses (such as 200 or 204)
are treated as unexpected server responses and fail before reaching the
responseBody parsing path. Keep the fix localized to the PROPFIND branch and
preserve the existing logging/error handling in this restore logic.
In `@libcore/http.go`:
- Around line 91-96: The HTTP clients used by GetContentLimited and
WriteToLimited only set connect and header timeouts, so body reads can still
hang. Update the h1h2Client setup in the http client initialization to use a
full client deadline, and apply the same per-request deadline behavior to the
HTTP/3 path where the separate http.Client is created. Make the change in the
shared client setup and the request-building code so both transport paths
enforce a body-read timeout.
- Around line 436-466: The WriteToLimited flow is making downloaded files
world-writable by calling Chmod on the temp file before rename, which bypasses
the default private permissions from os.CreateTemp. Remove the explicit 0666
chmod in httpResponse.WriteToLimited and keep the temp file permissions as
created, so the final os.Rename preserves private access unless broader
permissions are intentionally needed.
In `@libcore/io.go`:
- Around line 38-42: The temp file permission change in the file-writing path is
too permissive and should be removed or narrowed. In the code around the
temporary output handling in io.go, drop the unconditional o.Chmod(0666) before
rename so the os.CreateTemp default private mode is preserved, and make the same
change in the analogous temp-file flow in http.go. If a custom mode is required,
apply a caller-supplied, restrictive mode after the final rename instead of
making the temp file world-writable.
---
Outside diff comments:
In `@libcore/http.go`:
- Around line 133-144: The SOCKS proxy path in the http transport setup still
leaves `socks.ClientHandshake5` unbounded and can leak a failed `socksConn`
before retrying. In `libcore/http.go`, update the `c.h1h2Transport.DialContext`
logic to set a deadline or use the existing `ctx` timeout around the handshake,
and ensure any handshake error closes `socksConn` before the fallback/continue
path. Keep the fix localized to the `DialContext` closure and preserve the
existing `tryH3Direct` behavior.
---
Nitpick comments:
In `@app/src/main/java/io/nekohasekai/sagernet/ui/BackupFragment.kt`:
- Around line 690-724: Add regression coverage in the import flow to verify both
backup formats are dispatched correctly before the destructive reset path.
Extend the tests around BackupFragment.finishImport() so one case imports a v1
payload and another imports a BackupFormatV2.VERSION payload, asserting the
correct decode path is used for profiles/groups/rules/settings. This should
complement the existing BackupFormatV2Test and BackupFileNameTest coverage and
protect the version switch logic in finishImport.
In `@buildScript/lib/naive.sh`:
- Around line 35-43: The unsupported-ABI failure in expected_sha256_for_abi is
getting swallowed by the command substitution used at the verify_sha256 call
site, so the script reports a checksum mismatch instead of the real error.
Update the flow around expected_sha256_for_abi and verify_sha256 in naive.sh to
resolve the checksum into a variable before calling verify_sha256, and
explicitly detect the unsupported ABI case so the script exits with the correct
diagnostic rather than continuing with an empty expected value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ef980a58-1f67-4a6d-bbb3-be2bb4d956ef
📒 Files selected for processing (22)
.depot/workflows/build-apk.yml.github/workflows/ci.ymlapp/lint-baseline.xmlapp/src/main/java/io/nekohasekai/sagernet/bg/ServiceNotification.ktapp/src/main/java/io/nekohasekai/sagernet/bg/SubscriptionUpdater.ktapp/src/main/java/io/nekohasekai/sagernet/group/RawUpdater.ktapp/src/main/java/io/nekohasekai/sagernet/ui/AboutFragment.ktapp/src/main/java/io/nekohasekai/sagernet/ui/AssetsActivity.ktapp/src/main/java/io/nekohasekai/sagernet/ui/BackupFormatV2.ktapp/src/main/java/io/nekohasekai/sagernet/ui/BackupFragment.ktapp/src/main/java/io/nekohasekai/sagernet/ui/ConfigurationFragment.ktapp/src/main/java/io/nekohasekai/sagernet/ui/WebDAVSecurity.ktapp/src/main/java/io/nekohasekai/sagernet/ui/profile/ProfileSettingsActivity.ktapp/src/main/java/io/nekohasekai/sagernet/widget/GroupPreference.ktapp/src/test/java/io/nekohasekai/sagernet/bg/SubscriptionUpdaterScheduleTest.ktapp/src/test/java/io/nekohasekai/sagernet/ui/BackupFileNameTest.ktapp/src/test/java/io/nekohasekai/sagernet/ui/BackupFormatV2Test.ktapp/src/test/java/io/nekohasekai/sagernet/ui/WebDAVSecurityTest.ktbuildScript/lib/assets.shbuildScript/lib/naive.shlibcore/http.golibcore/io.go
💤 Files with no reviewable changes (1)
- app/lint-baseline.xml
| name: CI | ||
| on: | ||
| push: | ||
| tags-ignore: | ||
| - 'v*' | ||
| branches: | ||
| - '*' | ||
| pull_request: | ||
| workflow_dispatch: | ||
| env: | ||
| MIERU_VERSION: v3.34.0 | ||
| HYSTERIA_VERSION: v2.9.2 |
There was a problem hiding this comment.
Unit tests no longer run automatically
ci.yml previously triggered on every push and pull_request; it now only fires via workflow_dispatch. The new Depot workflow (build-apk.yml) runs on every branch push but only assembles the APK — it contains no ./gradlew test step. The four new test files added by this PR (SubscriptionUpdaterScheduleTest, BackupFileNameTest, BackupFormatV2Test, WebDAVSecurityTest) will never run in CI unless the workflow is triggered manually. A future commit that breaks these tests will pass CI silently.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| func (h *httpResponse) GetContentLimited(limit int64) ([]byte, error) { | ||
| h.contentMu.Lock() | ||
| defer h.contentMu.Unlock() | ||
| if h.contentRead { | ||
| if h.contentError != nil { | ||
| return nil, h.contentError | ||
| } | ||
| if int64(len(h.content)) > limit { | ||
| return nil, fmt.Errorf("HTTP response body exceeds %d bytes", limit) | ||
| } | ||
| return h.content, nil | ||
| } | ||
| defer h.Body.Close() | ||
| h.contentRead = true | ||
| h.contentLimit = limit | ||
| h.content, h.contentError = readAllLimited(h.Body, limit) | ||
| return h.content, h.contentError | ||
| } |
There was a problem hiding this comment.
Cached re-read with a smaller limit silently rejects valid content
When GetContentLimited(bigLimit) has already run and cached n bytes, a subsequent call with GetContentLimited(smallLimit) where n > smallLimit returns fmt.Errorf("HTTP response body exceeds %d bytes", smallLimit) — even though n was within the original limit and the body is gone. The check uses the caller's limit rather than the limit enforced during the read. The contentLimit field stores the original limit but is never consulted in the re-read path, making it dead data.
The unused "fmt" import in libcore/io.go failed gomobile compilation, which broke all four Depot CI jobs (build/test/lint/instrumented depend on libcore-from-source). Also addresses CodeRabbit/Greptile findings on PR #108: - http.go: bound the full request (incl. body reads) with a client timeout on the h1h2 client and both doH3Direct clients (ECH + H3). - http.go/io.go: drop world-writable Chmod(0666) on temp files; os.CreateTemp already creates them 0600 and they are atomically renamed (over-permissive on per-app-UID Android). - http.go: remove dead contentLimit field. - SubscriptionUpdater: use Long arithmetic in the due-check to avoid Int overflow, matching computeSubscriptionWorkSchedule. - BackupFragment: require WebDAV PROPFIND 207 before parsing the multistatus body (a 200/204 proxy/login page no longer falls through). - AssetsActivity: drop dead post-unxz size check (Libcore.unxz already enforces the same 256 MB cap and fails before rename). - .depot/workflows: make libcore/sidecars cache fingerprints content- and path-aware via 'git ls-files -s | sha256sum' (symlink-safe; the old find|cat masked content moves and a dangling buildScript/nkmr symlink), normalize the libcore-status command identically across all four workflows (shared cache key), and pass KEYSTORE_B64 via env:.
Pushed d6978c5 — root-cause fix + review follow-upsRoot cause of the failing checks: Review findings addressed:
Note on the new test classes: they run on push/PR via the unit-test workflow ( Local gates: |
Summary
Local verification
./gradlew :app:compileOssDebugKotlin --no-configuration-cache./gradlew :app:testOssDebugUnitTest --tests "io.nekohasekai.sagernet.bg.SubscriptionUpdaterScheduleTest" --tests "io.nekohasekai.sagernet.ui.*FileName*" --tests "io.nekohasekai.sagernet.ui.WebDAVSecurityTest" --tests "io.nekohasekai.sagernet.ui.BackupFormatV2Test" --no-configuration-cache./gradlew spotlessKotlinCheck --no-configuration-cachebash -n buildScript/lib/naive.sh buildScript/lib/assets.shGreptile Summary
This PR hardens several reliability and correctness paths: HTTP/file I/O gets request-level timeouts and per-call byte limits; subscription scheduling is extracted into a pure, testable function and gains null safety; the profile database update is wrapped in a single transaction with an integrity check; and backup export migrates to an explicit JSON v2 schema that removes the Parcel dependency while retaining v1 import compatibility.
NewHttpClientnow setsDialContext,TLSHandshakeTimeout,ResponseHeaderTimeout, and an overallClient.Timeout;WriteToLimited/ReadAllLimitedcap body reads;WriteToandUnxzare rewritten with atomic temp-file rename and cleanup guards.BackupFormatV2replaces opaque Parcel blobs with typed JSON fields and URL-safe Base64 for the Kryo bytes, paired with round-trip unit tests; v1 files continue to import via the legacy Parcel path.GroupPreferencemoves the DB query off the main thread;ProfileSettingsActivitycaches group data and fixes a null-dereference on the menu inflate path;ConfigurationFragmentfixes a use-before-assign onoldProfileand wraps thereloadAccesslock in try/finally.Confidence Score: 5/5
Safe to merge — all changed paths have tightly scoped fixes backed by new unit tests, and no regressions were found in the reviewed logic.
The DB transaction in RawUpdater is correct: the count check fires inside the transaction so a rollback leaves the group consistent, and lastUpdated is only written on success. The Go temp-file/rename pattern in WriteToLimited and Unxz is correct: defers run LIFO so the close+rename defer executes before the remove defer, meaning a rename failure still triggers cleanup. The subscription scheduler correctly uses coerceAtLeast(0) for initial delay and coerceAtLeast(15) for interval, removing the old unclamped-negative-delay bug. BackupFormatV2 round-trips are validated by the new test suite.
buildScript/lib/assets.sh — the GEOIP_SHA256/GEOSITE_SHA256 defaults lack the co-update comment that naive.sh has, making future version bumps slightly error-prone.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[SubscriptionUpdater.registerSubscriptions] --> B[computeSubscriptionWorkSchedule\nminInterval + initialDelay] B --> C[PeriodicWorkRequest\ninterval = min delay >= 15 min] C --> D[UpdateTask.doWork] D --> E{for each subscription\nlastUpdated + delay <= now?} E -- skip --> F[log: not updating] E -- update --> G[notifyProgress\ncheck permission] G --> H[GroupUpdater.executeUpdate] H --> I[RawUpdater.doUpdate] I --> J[HTTP GET subscription\ngetContentStringLimited 10 MB] J --> K[parseRaw proxies list] K --> L[runInTransaction\ninsert / update / delete\ncountByGroup == proxies.size?] L -- mismatch throw --> M[transaction rollback] L -- ok --> N[lastUpdated = now\nupdateGroup] N --> O[executeUpdate true] M --> P[executeUpdate false] O --> Q{attempted and failed?} P --> Q Q -- yes --> R[Result.retry] Q -- no --> S[Result.success]%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[SubscriptionUpdater.registerSubscriptions] --> B[computeSubscriptionWorkSchedule\nminInterval + initialDelay] B --> C[PeriodicWorkRequest\ninterval = min delay >= 15 min] C --> D[UpdateTask.doWork] D --> E{for each subscription\nlastUpdated + delay <= now?} E -- skip --> F[log: not updating] E -- update --> G[notifyProgress\ncheck permission] G --> H[GroupUpdater.executeUpdate] H --> I[RawUpdater.doUpdate] I --> J[HTTP GET subscription\ngetContentStringLimited 10 MB] J --> K[parseRaw proxies list] K --> L[runInTransaction\ninsert / update / delete\ncountByGroup == proxies.size?] L -- mismatch throw --> M[transaction rollback] L -- ok --> N[lastUpdated = now\nupdateGroup] N --> O[executeUpdate true] M --> P[executeUpdate false] O --> Q{attempted and failed?} P --> Q Q -- yes --> R[Result.retry] Q -- no --> S[Result.success]Comments Outside Diff (1)
app/src/main/java/io/nekohasekai/sagernet/ui/AssetsActivity.kt, line 564-576 (link)Libcore.unxzalready enforces the same limitLibcore.unxz(Go’sUnxz) usescopyLimited(o, r, defaultUnxzFileLimit)wheredefaultUnxzFileLimit = 256 * 1024 * 1024.MAX_RULE_ASSET_BYTESin Kotlin is also256L * 1024 * 1024. If the decompressed stream exceeds the limit, Go returns an error andLibcore.unxzthrows beforeunpackedFileis ever written. Theif (unpackedFile.length() > MAX_RULE_ASSET_BYTES)check is therefore dead code.Reviews (2): Last reviewed commit: "fix(libcore): drop unused fmt import (bu..." | Re-trigger Greptile