Fix cargo warnings#619
Conversation
reqwest 0.11.x pins rustls 0.21.x, which only allows the vulnerable rustls-webpki 0.101.7 (flagged by Dependabot, GHSA for rustls-webpki). Upgrading to reqwest 0.12 pulls in rustls 0.23 and rustls-webpki 0.103.13, removing the vulnerable version from the tree. The rustls-tls-native-roots feature is kept (required for Android builds). No source changes were needed; the reqwest API surface used by aw-client-rust and aw-sync is unchanged between 0.11 and 0.12. Closes ActivityWatch#592
…Watch#493) Add an opt-in 'compression-zstd' feature that transparently compresses the events.data column with zstd. ActivityWatch events are tiny and their redundancy is across rows (repeated app names, JSON keys, window titles), so a dictionary is trained on a sample of the database's own events, stored once, and used to compress every row. On a real ~545k-event database this shrinks the stored event JSON by ~47% (file after VACUUM: 77.4 MB -> 52.3 MB) with negligible read overhead (~0.5 us/event). Details: - New compression module: dictionary-compressed blobs are marked with a 0xCC prefix; rows that would not shrink (or before a dictionary exists) are stored as raw JSON, so a row is never larger than plain JSON. A single reusable compressor/decompressor is kept per connection. - Event data column migrated from TEXT to BLOB (db v6) so binary compresses without any encoding overhead. - The dictionary is trained once at startup when the database has enough events (and on upgrade), then existing rows are recompressed in a single transaction; new events are compressed on insert. - Feature is off by default; a database with compressed rows requires a build with the feature enabled to read it. Also enable foreign_keys and add ON DELETE CASCADE to events.bucketrow so deleting a bucket removes its events; the v6 migration drops pre-existing orphan events whose bucket no longer exists. Tested with and without the feature, including a full migrate+train+backfill roundtrip on a real database.
|
depends on #618, can just rebase if necessary. minimal dependency really |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #619 +/- ##
==========================================
+ Coverage 70.81% 77.15% +6.33%
==========================================
Files 51 63 +12
Lines 2916 4986 +2070
==========================================
+ Hits 2065 3847 +1782
- Misses 851 1139 +288 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Greptile SummaryDespite the "Fix cargo warnings" title, this PR bundles three distinct changes: upgrading reqwest from 0.11 → 0.12 (and its transitive dependency chain), silencing dead-code warnings with
Confidence Score: 3/5The reqwest upgrade and warning-suppression changes are safe to merge; the compression feature should not be shipped until the fallback panic in the event-read path is fixed. Both get_event and get_events_inner call .unwrap() on serde_json::from_str after substituting raw binary bytes when decompression fails. A compressed database opened without the feature panics the DatastoreWorker thread, making the datastore permanently unavailable. aw-datastore/src/datastore.rs — the get_event and get_events_inner decompression fallback paths Important Files Changed
Reviews (1): Last reviewed commit: "fix: resolve cargo build warnings" | Re-trigger Greptile |
| let data: serde_json::map::Map<String, Value> = | ||
| serde_json::from_str(&data_str).unwrap(); | ||
| serde_json::from_str(&decompressed_data_str).unwrap(); |
There was a problem hiding this comment.
Panic on decompressed-but-non-JSON data
When decompress returns an Err (e.g., a compressed row opened by a build without compression-zstd, or a missing/corrupt dictionary), the code falls back to String::from_utf8_lossy(&data_bytes). For a blob starting with 0xCC followed by zstd frame bytes, that string is not valid JSON. The subsequent .unwrap() panics the DatastoreWorker thread, dropping the channel and rendering the datastore permanently unavailable. The same issue exists at line ~1043 in get_events_inner. The fallback should propagate the decompress error instead of substituting non-JSON bytes.
| let decompressed_data_str = match compression.decompress(&data_bytes) { | ||
| Ok(decompressed) => decompressed, | ||
| Err(e) => { | ||
| warn!("Failed to decompress event data: {}. Attempting to parse as uncompressed.", e); | ||
| String::from_utf8_lossy(&data_bytes).to_string() | ||
| } | ||
| }; | ||
|
|
||
| let data: serde_json::map::Map<String, Value> = | ||
| serde_json::from_str(&data_str).unwrap(); | ||
| serde_json::from_str(&decompressed_data_str).unwrap(); |
There was a problem hiding this comment.
Same panic on decompressed-but-non-JSON data (get_events path)
This is the same issue as in get_event above: the Err fallback via String::from_utf8_lossy produces non-JSON bytes when the row is compressed, and the .unwrap() on serde_json::from_str panics the worker thread. Propagating the error from the closure via rusqlite::Error::InvalidColumnType (or similar) would surface it as a query error rather than a crash.
| if 5 + frame.len() < json.len() { | ||
| let mut blob = Vec::with_capacity(5 + frame.len()); | ||
| blob.push(COMPRESSION_MARKER); | ||
| blob.extend_from_slice(&(json.len() as u32).to_le_bytes()); |
There was a problem hiding this comment.
json.len() is cast to u32 without bounds checking. If a JSON string exceeds 4 GiB the stored orig_len silently truncates, producing an incorrect header that causes a failed to decompress event data error on the next read. Using a checked cast is safer.
| blob.extend_from_slice(&(json.len() as u32).to_le_bytes()); | |
| let orig_len = u32::try_from(json.len()) | |
| .expect("event JSON exceeds 4 GiB limit"); | |
| blob.extend_from_slice(&orig_len.to_le_bytes()); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
No description provided.