Otel optimization#1678
Conversation
…tpaths - Introduce `StackBuf` in `compute_series_hash` to format non-string attribute values (int/float/bool) directly to a 32-byte stack array instead of calling `.to_string()`. This removes transient heap allocations in the ingestion path. The 32-byte limit is derived from the max string representation of OTel i64/f64 values. - Fix `Map::with_capacity` in `flatten_number_data_points` to accurately account for exemplar fields (+4 per exemplar).
WalkthroughThree internal OTEL processing files are updated to improve performance and clarity. ChangesOTEL Log, Metric, and Trace Processing Optimizations
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/otel/metrics.rs (1)
38-84:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExclude histogram/summary sample fields from
__series_hashidentity.At Line 132, filtering depends on
OTEL_METRICS_KNOWN_FIELDS, butmin,max, anddata_point_quantile_valuesare currently not in the known-field list. Since these are sample-level fields (not labels), they should not affect the physical-series hash.Suggested fix
-pub const OTEL_METRICS_KNOWN_FIELD_LIST: [&str; 37] = [ +pub const OTEL_METRICS_KNOWN_FIELD_LIST: [&str; 40] = [ @@ "data_point_sum", + "min", + "max", @@ "data_point_quantile_values_quantile", "data_point_quantile_values_value", + "data_point_quantile_values",Also applies to: 130-133, 337-338, 450-472
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/otel/metrics.rs` around lines 38 - 84, The OTEL_METRICS_KNOWN_FIELD_LIST constant is missing sample-level fields that should not contribute to the physical-series hash identity. Add the three missing fields "min", "max", and "data_point_quantile_values" to the OTEL_METRICS_KNOWN_FIELD_LIST array in src/otel/metrics.rs. These sample-level fields (as opposed to label fields) must be included in the known-fields list so that filtering logic correctly excludes them from the __series_hash computation. This change will ensure that the hash only incorporates metric name and label key/value pairs, not per-sample attributes.
🧹 Nitpick comments (2)
src/otel/logs.rs (1)
151-160: ⚡ Quick winTighten the fast-path to objects only.
Lines 151-160 still send
[...]strings throughserde_json::from_str, but the next guard drops every non-object parse. That means array-shaped or bracketed payloads still pay the parse cost with no flattened output, which undercuts the hot-path optimization here.♻️ Proposed tweak
- let looks_like_json = trimmed.len() >= 2 - && matches!( - (trimmed.as_bytes().first(), trimmed.as_bytes().last()), - (Some(b'{'), Some(b'}')) | (Some(b'['), Some(b']')) - ); + let looks_like_json = trimmed.len() >= 2 + && matches!( + (trimmed.as_bytes().first(), trimmed.as_bytes().last()), + (Some(b'{'), Some(b'}')) + );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/otel/logs.rs` around lines 151 - 160, The looks_like_json check in the matches! macro accepts both object patterns (curly braces) and array patterns (square brackets), but the subsequent guard at parsed.is_object() will reject arrays anyway, causing unnecessary parse overhead for array-shaped payloads. Remove the array pattern case (Some(b'['), Some(b']')) from the matches! macro so that only true object-shaped payloads proceed to the serde_json parsing step, tightening the fast-path optimization to only handle the cases that will actually produce flattened output.src/otel/metrics.rs (1)
110-117: ⚡ Quick winFail fast on
StackBufoverflow instead of silently truncating hash input.At Line 113-Line 116, over-capacity writes are truncated but still treated as success. If the numeric formatting invariant ever shifts, distinct values can hash identically.
Suggested hardening
impl std::fmt::Write for StackBuf { fn write_str(&mut self, s: &str) -> std::fmt::Result { let bytes = s.as_bytes(); - let end = std::cmp::min(self.len + bytes.len(), 32); - self.data[self.len..end].copy_from_slice(&bytes[..end - self.len]); - self.len = end; - Ok(()) + let end = self.len.checked_add(bytes.len()).ok_or(std::fmt::Error)?; + if end > self.data.len() { + return Err(std::fmt::Error); + } + self.data[self.len..end].copy_from_slice(bytes); + self.len = end; + Ok(()) } } @@ Value::Number(n) => { // Zero- Alloc number formatting let mut buf = StackBuf::new(); - let _ = std::fmt::Write::write_fmt(&mut buf, format_args!("{}", n)); - hasher.write(buf.as_bytes()); + if std::fmt::Write::write_fmt(&mut buf, format_args!("{}", n)).is_ok() { + hasher.write(buf.as_bytes()); + } else { + hasher.write(n.to_string().as_bytes()); + } }Also applies to: 158-163
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/otel/metrics.rs` around lines 110 - 117, The write_str method in the StackBuf implementation silently truncates input when it exceeds the buffer capacity instead of failing, which could cause hash collisions if formatting invariants change. Modify the write_str method to check if the incoming bytes would exceed the 32-byte capacity before writing, and return Err(std::fmt::Error) if they would, rather than silently truncating the data. This check should be applied at both locations where StackBuf's write_str is implemented (the primary location around line 110-117 and the sibling location around line 158-163).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/otel/logs.rs`:
- Around line 284-296: The schema_url is being inserted into resource_log_json
after flatten_scope_log has already cloned and extended vec_resource_logs_json
with those clones. Since the clones are created before schema_url is added, the
flattened logs miss this field. Move the schema_url insertion (the code that
calls resource_log_json.insert with "schema_url") to occur before the
flatten_scope_log function call, so that when flatten_scope_log clones
resource_log_json, the clones will already contain the schema_url field.
---
Outside diff comments:
In `@src/otel/metrics.rs`:
- Around line 38-84: The OTEL_METRICS_KNOWN_FIELD_LIST constant is missing
sample-level fields that should not contribute to the physical-series hash
identity. Add the three missing fields "min", "max", and
"data_point_quantile_values" to the OTEL_METRICS_KNOWN_FIELD_LIST array in
src/otel/metrics.rs. These sample-level fields (as opposed to label fields) must
be included in the known-fields list so that filtering logic correctly excludes
them from the __series_hash computation. This change will ensure that the hash
only incorporates metric name and label key/value pairs, not per-sample
attributes.
---
Nitpick comments:
In `@src/otel/logs.rs`:
- Around line 151-160: The looks_like_json check in the matches! macro accepts
both object patterns (curly braces) and array patterns (square brackets), but
the subsequent guard at parsed.is_object() will reject arrays anyway, causing
unnecessary parse overhead for array-shaped payloads. Remove the array pattern
case (Some(b'['), Some(b']')) from the matches! macro so that only true
object-shaped payloads proceed to the serde_json parsing step, tightening the
fast-path optimization to only handle the cases that will actually produce
flattened output.
In `@src/otel/metrics.rs`:
- Around line 110-117: The write_str method in the StackBuf implementation
silently truncates input when it exceeds the buffer capacity instead of failing,
which could cause hash collisions if formatting invariants change. Modify the
write_str method to check if the incoming bytes would exceed the 32-byte
capacity before writing, and return Err(std::fmt::Error) if they would, rather
than silently truncating the data. This check should be applied at both
locations where StackBuf's write_str is implemented (the primary location around
line 110-117 and the sibling location around line 158-163).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f91053d2-4e27-4c03-9fe0-cad19ade614c
📒 Files selected for processing (2)
src/otel/logs.rssrc/otel/metrics.rs
Avoid expensive serde_json::from_str on non-JSON string attributes by checking first/last bytes before attempting parse. This eliminates the lexer initialization, character scanning, and error unwind path for ~99% of string attributes. Also reduce clone operations by passing resource map as reference to flatten_scope_log instead of cloning per log record. Benchmarks (593k log records): - flatten_log_record: 28.73µs → 22.55µs (21.5% faster) - flatten_scope_log: 4.47ms → 3.83ms (14.3% faster) - Allocations: unchanged (same data, fewer operations)
ed632b1 to
9655f2f
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/otel/logs.rs (1)
224-246:⚠️ Potential issue | 🟠 MajorAdd field-collision protection for reserved schema_url fields.
At Line 101 (resource attributes) and Line 59 (scope attributes),
insert_attributes()allows any attribute key to be inserted without filtering. Becauseschema_url(Line 281) andscope_log_schema_url(Line 239) are set before the scope-to-log merge at Line 73, a log record attribute with keyschema_urlorscope_log_schema_urlwill overwrite the canonical resource/scope metadata via.extend().This mirrors the intentional protection already in place for
p_log_category(Lines 62–65), which inserts after attributes to prevent override. Apply the same pattern: either filter reserved keys ininsert_attributes()or re-insert canonical values after the log-record merge. Additionally, add a regression test that verifies a log attribute namedschema_urldoes not corrupt the resource-level schema URL.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/otel/logs.rs` around lines 224 - 246, Protect reserved schema metadata fields from being overwritten by user attributes. The `insert_attributes()` function currently allows any attribute key without filtering, and since `schema_url` and `scope_log_schema_url` are inserted before the log-record merge via `.extend()`, a log attribute with these keys will corrupt the canonical metadata. Apply the same protection pattern used for `p_log_category` by either filtering out reserved keys (`schema_url` and `scope_log_schema_url`) in the `insert_attributes()` function itself, or by re-inserting these canonical values immediately after the `.extend()` call to ensure they cannot be overwritten. Additionally, add a regression test that verifies a log attribute named `schema_url` does not overwrite the resource-level schema URL, ensuring the fix remains effective.
🧹 Nitpick comments (1)
src/otel/logs.rs (1)
151-160: ⚡ Quick winAvoid array-shaped speculative parsing in the object-only flattening path.
Line 154 accepts
[...], but Line 159 immediately requiresparsed.is_object(). That still pays parse cost for array-looking strings on the hot path.[Suggestion: keep the guard object-only to skip unnecessary
serde_json::from_strcalls.]♻️ Proposed diff
- let looks_like_json = trimmed.len() >= 2 + let looks_like_json_object = trimmed.len() >= 2 && matches!( (trimmed.as_bytes().first(), trimmed.as_bytes().last()), - (Some(b'{'), Some(b'}')) | (Some(b'['), Some(b']')) + (Some(b'{'), Some(b'}')) ); // Skip speculative JSON parsing unless the body looks like structured JSON - if looks_like_json + if looks_like_json_object && let Ok(parsed) = serde_json::from_str::<Value>(s) && parsed.is_object() && let Ok(flattened_values) = generic_flattening(&parsed)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/otel/logs.rs` around lines 151 - 160, The JSON detection logic in the `looks_like_json` check on line 154 accepts both object and array patterns, but the subsequent condition on line 159 with `parsed.is_object()` filters to objects only, meaning array-shaped strings incur unnecessary JSON parsing cost. To fix this, modify the pattern matching in the `looks_like_json` check to only accept the object pattern (curly braces) and remove the array pattern (square brackets) from the `matches!` comparison. This ensures that strings that look like arrays skip the speculative JSON parsing entirely, avoiding wasted parse operations on the hot path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/otel/logs.rs`:
- Around line 224-246: Protect reserved schema metadata fields from being
overwritten by user attributes. The `insert_attributes()` function currently
allows any attribute key without filtering, and since `schema_url` and
`scope_log_schema_url` are inserted before the log-record merge via `.extend()`,
a log attribute with these keys will corrupt the canonical metadata. Apply the
same protection pattern used for `p_log_category` by either filtering out
reserved keys (`schema_url` and `scope_log_schema_url`) in the
`insert_attributes()` function itself, or by re-inserting these canonical values
immediately after the `.extend()` call to ensure they cannot be overwritten.
Additionally, add a regression test that verifies a log attribute named
`schema_url` does not overwrite the resource-level schema URL, ensuring the fix
remains effective.
---
Nitpick comments:
In `@src/otel/logs.rs`:
- Around line 151-160: The JSON detection logic in the `looks_like_json` check
on line 154 accepts both object and array patterns, but the subsequent condition
on line 159 with `parsed.is_object()` filters to objects only, meaning
array-shaped strings incur unnecessary JSON parsing cost. To fix this, modify
the pattern matching in the `looks_like_json` check to only accept the object
pattern (curly braces) and remove the array pattern (square brackets) from the
`matches!` comparison. This ensures that strings that look like arrays skip the
speculative JSON parsing entirely, avoiding wasted parse operations on the hot
path.
…tening Remove a redundant full-depth clone of JSON maps in `process_resource_spans` by consuming the vector instead of borrowing, allowing direct moves into `Value::Object`. Additionally, refactor `flatten_attributes` and `insert_json_from_value` to pass protobuf values by reference, preventing deep clones of nested arrays and KvLists before serde conversion. Add capacity hints to Vec and Map initializations based on known payload lengths. Profiling (143 requests, ~597k span records): - Hotpath total time: 57.10s → 43.67s (23.5% faster) - Hotpath P95 latency: 742.39ms → 489.68ms (34.0% faster) - Total allocations: 12.2 GB → 8.4 GB (31.1% reduction) - `process_resource_spans` CPU: 5.49% → 2.22% (56.0% reduction)
c425fde to
7f6549f
Compare
Fixes #XXXX.
Description
Optimizes OTel ingestion hotpaths by eliminating unnecessary heap allocations, redundant parsing, and excessive cloning across Logs, Metrics, and Traces.
Logs: Avoids
serde_json::from_stron non-JSON string attributes (~99% of cases) by checking first/last bytes for{/}before parsing. Also passes resource map by reference instead of cloning per record.Metrics: Replaces
.to_string()with a 32-byte stack buffer (StackBuf) for formatting non-string attribute values incompute_series_hash. The 32-byte limit covers max OTeli64/f64string representations. FixesMap::with_capacityinflatten_number_data_pointsto account for exemplar fields (+4 per exemplar).Traces: Eliminates a redundant full-depth clone of JSON maps in
process_resource_spansby consuming vectors instead of borrowing, enabling direct moves intoValue::Object. AddsVecandMapcapacity hints based on known payload lengths. Computes the ingestion date once per request instead of per ScopeSpanBenchmarks:
flatten_log_recordflatten_scope_logThis PR has: