Otel optimization by ygndotgg · Pull Request #1678 · parseablehq/parseable

ygndotgg · 2026-06-14T04:30:14Z

Fixes #XXXX.

Description

Optimizes OTel ingestion hotpaths by eliminating unnecessary heap allocations, redundant parsing, and excessive cloning across Logs, Metrics, and Traces.

Logs: Avoids serde_json::from_str on non-JSON string attributes (~99% of cases) by checking first/last bytes for {/} before parsing. Also passes resource map by reference instead of cloning per record.

Metrics: Replaces .to_string() with a 32-byte stack buffer (StackBuf) for formatting non-string attribute values in compute_series_hash. The 32-byte limit covers max OTel i64/f64 string representations. Fixes Map::with_capacity in flatten_number_data_points to account for exemplar fields (+4 per exemplar).

Traces: Eliminates a redundant full-depth clone of JSON maps in process_resource_spans by consuming vectors instead of borrowing, enabling direct moves into Value::Object. Adds Vec and Map capacity hints based on known payload lengths. Computes the ingestion date once per request instead of per ScopeSpan

Benchmarks:

Path	Metric	Before	After	Δ
Logs (593k records)	`flatten_log_record`	28.73µs	22.55µs	-21.5%
Logs (593k records)	`flatten_scope_log`	4.47ms	3.83ms	-14.3%
Metrics (1.4M+ points)	hash alloc/call	640 B	256 B	-60.0%
Metrics (high load)	total hash allocs	705.9 MB	363.4 MB	-48.5%
Traces (~597k spans)	Hotpath total time	57.10s	43.67s	-23.5%
Traces (~597k spans)	P95 request latency	742.39ms	489.68ms	-34.0%
Traces (~597k spans)	Total allocations	12.2 GB	8.4 GB	-31.1%

This PR has:

been tested to ensure log, metric, and trace ingestion and query works.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added documentation for new or modified features or behaviors.

…tpaths - Introduce `StackBuf` in `compute_series_hash` to format non-string attribute values (int/float/bool) directly to a 32-byte stack array instead of calling `.to_string()`. This removes transient heap allocations in the ingestion path. The 32-byte limit is derived from the max string representation of OTel i64/f64 values. - Fix `Map::with_capacity` in `flatten_number_data_points` to accurately account for exemplar fields (+4 per exemplar).

coderabbitai · 2026-06-14T04:30:28Z

Walkthrough

Three internal OTEL processing files are updated to improve performance and clarity. logs.rs preallocates Map capacities, adds a byte-heuristic guard before JSON body parsing, and refactors scope/resource merging to inject a shared resource_base_json accumulator. metrics.rs replaces the Cow-based label-pair hashing with a sorted-key, type-explicit loop using a new stack-allocated StackBuf number formatter and widened map capacity. traces.rs threads date computation once through the flattening pipeline and preallocates span-record vectors.

Changes

OTEL Log, Metric, and Trace Processing Optimizations

Layer / File(s)	Summary
Log record and scope-log flattening optimizations `src/otel/logs.rs`	`flatten_log_record` initializes `Map` with `with_capacity` sized to a constant plus attribute count, and gates JSON body parsing behind a trimmed first/last-byte heuristic to skip `serde_json::from_str` for non-JSON strings. `flatten_scope_log` accepts a `resource_base_json` reference, clones it as the per-scope accumulator, inserts `scope_log_schema_url`, and preallocates the output `Vec`. `process_resource_logs` builds `resource_log_json` once, passes it in, inserts `schema_url`, and appends result objects without post-flatten cloning.
Metric series hash rewrite with StackBuf `src/otel/metrics.rs`	Removes `Cow` import. Adds a private 32-byte `StackBuf` with `std::fmt::Write` for allocation-free number formatting. `compute_series_hash` now collects and sorts label keys (excluding `OTEL_METRICS_KNOWN_FIELDS`), then hashes each key/value using type-explicit serialization (`String`, `Bool`, `Number` via `StackBuf`, `"null"` for others) with preserved delimiters and terminator bytes. `flatten_number_data_points` widens `Map` capacity to include exemplar space.
Trace flattening date threading and preallocation `src/otel/traces.rs`	`flatten_otel_traces_protobuf` and `flatten_otel_traces` compute `date` once at the top level and thread it through `process_resource_spans` into `flatten_scope_span`, eliminating redundant `chrono::Utc::now()` calls. Resource/span merge loop switches to move-based iteration. `flatten_span_record` preallocates vector capacity using the total count of events and links.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

parseablehq/parseable#1648: Modifies src/otel/metrics.rs to change the physical-series hashing logic and _series_hash computation, directly related to this PR's compute_series_hash rewrite.
parseablehq/parseable#1522: Introduced JSON body flattening in flatten_log_record, which this PR refines by adding a heuristic guard before attempting serde_json parsing.
parseablehq/parseable#1391: Introduced the process_resource_logs/flatten_scope_log shared flattening flow that this PR refactors to inject resource_base_json and propagate schema_url.

Suggested reviewers

nikhilsinhaparseable

Poem

🐇 Hop, hop, no more wasted heap tonight,
A StackBuf scurries bytes with furry delight!
JSON bodies checked with a byte-sniff glance,
Schema URLs join the resource dance.
Sorted keys hash true, no Cow in sight—
This bunny's OTEL dreams compile just right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Otel optimization' is vague and generic, using a non-descriptive term that doesn't convey the specific optimizations (heap allocation reduction, JSON parsing optimization, stack buffer usage).	Consider a more specific title like 'Optimize OTel ingestion: reduce heap allocations and parsing overhead' to clearly communicate the key improvements.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is comprehensive, well-structured, and covers all required sections with detailed information about changes, benchmarks, and testing status.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/otel/metrics.rs (1)

38-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude histogram/summary sample fields from __series_hash identity.

At Line 132, filtering depends on OTEL_METRICS_KNOWN_FIELDS, but min, max, and data_point_quantile_values are currently not in the known-field list. Since these are sample-level fields (not labels), they should not affect the physical-series hash.

Suggested fix

-pub const OTEL_METRICS_KNOWN_FIELD_LIST: [&str; 37] = [
+pub const OTEL_METRICS_KNOWN_FIELD_LIST: [&str; 40] = [
@@
     "data_point_sum",
+    "min",
+    "max",
@@
     "data_point_quantile_values_quantile",
     "data_point_quantile_values_value",
+    "data_point_quantile_values",

Also applies to: 130-133, 337-338, 450-472

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/otel/metrics.rs` around lines 38 - 84, The OTEL_METRICS_KNOWN_FIELD_LIST
constant is missing sample-level fields that should not contribute to the
physical-series hash identity. Add the three missing fields "min", "max", and
"data_point_quantile_values" to the OTEL_METRICS_KNOWN_FIELD_LIST array in
src/otel/metrics.rs. These sample-level fields (as opposed to label fields) must
be included in the known-fields list so that filtering logic correctly excludes
them from the __series_hash computation. This change will ensure that the hash
only incorporates metric name and label key/value pairs, not per-sample
attributes.

🧹 Nitpick comments (2)

src/otel/logs.rs (1)

151-160: ⚡ Quick win

Tighten the fast-path to objects only.

Lines 151-160 still send [...] strings through serde_json::from_str, but the next guard drops every non-object parse. That means array-shaped or bracketed payloads still pay the parse cost with no flattened output, which undercuts the hot-path optimization here.

♻️ Proposed tweak

-                let looks_like_json = trimmed.len() >= 2
-                    && matches!(
-                        (trimmed.as_bytes().first(), trimmed.as_bytes().last()),
-                        (Some(b'{'), Some(b'}')) | (Some(b'['), Some(b']'))
-                    );
+                let looks_like_json = trimmed.len() >= 2
+                    && matches!(
+                        (trimmed.as_bytes().first(), trimmed.as_bytes().last()),
+                        (Some(b'{'), Some(b'}'))
+                    );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/otel/logs.rs` around lines 151 - 160, The looks_like_json check in the
matches! macro accepts both object patterns (curly braces) and array patterns
(square brackets), but the subsequent guard at parsed.is_object() will reject
arrays anyway, causing unnecessary parse overhead for array-shaped payloads.
Remove the array pattern case (Some(b'['), Some(b']')) from the matches! macro
so that only true object-shaped payloads proceed to the serde_json parsing step,
tightening the fast-path optimization to only handle the cases that will
actually produce flattened output.

src/otel/metrics.rs (1)

110-117: ⚡ Quick win

Fail fast on StackBuf overflow instead of silently truncating hash input.

At Line 113-Line 116, over-capacity writes are truncated but still treated as success. If the numeric formatting invariant ever shifts, distinct values can hash identically.

Suggested hardening

 impl std::fmt::Write for StackBuf {
     fn write_str(&mut self, s: &str) -> std::fmt::Result {
         let bytes = s.as_bytes();
-        let end = std::cmp::min(self.len + bytes.len(), 32);
-        self.data[self.len..end].copy_from_slice(&bytes[..end - self.len]);
-        self.len = end;
-        Ok(())
+        let end = self.len.checked_add(bytes.len()).ok_or(std::fmt::Error)?;
+        if end > self.data.len() {
+            return Err(std::fmt::Error);
+        }
+        self.data[self.len..end].copy_from_slice(bytes);
+        self.len = end;
+        Ok(())
     }
 }
@@
                 Value::Number(n) => {
                     // Zero- Alloc number formatting
                     let mut buf = StackBuf::new();
-                    let _ = std::fmt::Write::write_fmt(&mut buf, format_args!("{}", n));
-                    hasher.write(buf.as_bytes());
+                    if std::fmt::Write::write_fmt(&mut buf, format_args!("{}", n)).is_ok() {
+                        hasher.write(buf.as_bytes());
+                    } else {
+                        hasher.write(n.to_string().as_bytes());
+                    }
                 }

Also applies to: 158-163

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/otel/metrics.rs` around lines 110 - 117, The write_str method in the
StackBuf implementation silently truncates input when it exceeds the buffer
capacity instead of failing, which could cause hash collisions if formatting
invariants change. Modify the write_str method to check if the incoming bytes
would exceed the 32-byte capacity before writing, and return
Err(std::fmt::Error) if they would, rather than silently truncating the data.
This check should be applied at both locations where StackBuf's write_str is
implemented (the primary location around line 110-117 and the sibling location
around line 158-163).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/otel/logs.rs`:
- Around line 284-296: The schema_url is being inserted into resource_log_json
after flatten_scope_log has already cloned and extended vec_resource_logs_json
with those clones. Since the clones are created before schema_url is added, the
flattened logs miss this field. Move the schema_url insertion (the code that
calls resource_log_json.insert with "schema_url") to occur before the
flatten_scope_log function call, so that when flatten_scope_log clones
resource_log_json, the clones will already contain the schema_url field.

---

Outside diff comments:
In `@src/otel/metrics.rs`:
- Around line 38-84: The OTEL_METRICS_KNOWN_FIELD_LIST constant is missing
sample-level fields that should not contribute to the physical-series hash
identity. Add the three missing fields "min", "max", and
"data_point_quantile_values" to the OTEL_METRICS_KNOWN_FIELD_LIST array in
src/otel/metrics.rs. These sample-level fields (as opposed to label fields) must
be included in the known-fields list so that filtering logic correctly excludes
them from the __series_hash computation. This change will ensure that the hash
only incorporates metric name and label key/value pairs, not per-sample
attributes.

---

Nitpick comments:
In `@src/otel/logs.rs`:
- Around line 151-160: The looks_like_json check in the matches! macro accepts
both object patterns (curly braces) and array patterns (square brackets), but
the subsequent guard at parsed.is_object() will reject arrays anyway, causing
unnecessary parse overhead for array-shaped payloads. Remove the array pattern
case (Some(b'['), Some(b']')) from the matches! macro so that only true
object-shaped payloads proceed to the serde_json parsing step, tightening the
fast-path optimization to only handle the cases that will actually produce
flattened output.

In `@src/otel/metrics.rs`:
- Around line 110-117: The write_str method in the StackBuf implementation
silently truncates input when it exceeds the buffer capacity instead of failing,
which could cause hash collisions if formatting invariants change. Modify the
write_str method to check if the incoming bytes would exceed the 32-byte
capacity before writing, and return Err(std::fmt::Error) if they would, rather
than silently truncating the data. This check should be applied at both
locations where StackBuf's write_str is implemented (the primary location around
line 110-117 and the sibling location around line 158-163).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f91053d2-4e27-4c03-9fe0-cad19ade614c

📥 Commits

Reviewing files that changed from the base of the PR and between cc0dd62 and ed632b1.

📒 Files selected for processing (2)

src/otel/logs.rs
src/otel/metrics.rs

Avoid expensive serde_json::from_str on non-JSON string attributes by checking first/last bytes before attempting parse. This eliminates the lexer initialization, character scanning, and error unwind path for ~99% of string attributes. Also reduce clone operations by passing resource map as reference to flatten_scope_log instead of cloning per log record. Benchmarks (593k log records): - flatten_log_record: 28.73µs → 22.55µs (21.5% faster) - flatten_scope_log: 4.47ms → 3.83ms (14.3% faster) - Allocations: unchanged (same data, fewer operations)

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/otel/logs.rs (1)
224-246: ⚠️ Potential issue | 🟠 Major

Add field-collision protection for reserved schema_url fields.

At Line 101 (resource attributes) and Line 59 (scope attributes), insert_attributes() allows any attribute key to be inserted without filtering. Because schema_url (Line 281) and scope_log_schema_url (Line 239) are set before the scope-to-log merge at Line 73, a log record attribute with key schema_url or scope_log_schema_url will overwrite the canonical resource/scope metadata via .extend().

This mirrors the intentional protection already in place for p_log_category (Lines 62–65), which inserts after attributes to prevent override. Apply the same pattern: either filter reserved keys in insert_attributes() or re-insert canonical values after the log-record merge. Additionally, add a regression test that verifies a log attribute named schema_url does not corrupt the resource-level schema URL.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/otel/logs.rs` around lines 224 - 246, Protect reserved schema metadata
fields from being overwritten by user attributes. The `insert_attributes()`
function currently allows any attribute key without filtering, and since
`schema_url` and `scope_log_schema_url` are inserted before the log-record merge
via `.extend()`, a log attribute with these keys will corrupt the canonical
metadata. Apply the same protection pattern used for `p_log_category` by either
filtering out reserved keys (`schema_url` and `scope_log_schema_url`) in the
`insert_attributes()` function itself, or by re-inserting these canonical values
immediately after the `.extend()` call to ensure they cannot be overwritten.
Additionally, add a regression test that verifies a log attribute named
`schema_url` does not overwrite the resource-level schema URL, ensuring the fix
remains effective.

🧹 Nitpick comments (1)

src/otel/logs.rs (1)

151-160: ⚡ Quick win

Avoid array-shaped speculative parsing in the object-only flattening path.

Line 154 accepts [...], but Line 159 immediately requires parsed.is_object(). That still pays parse cost for array-looking strings on the hot path.

[Suggestion: keep the guard object-only to skip unnecessary serde_json::from_str calls.]

♻️ Proposed diff

-                let looks_like_json = trimmed.len() >= 2
+                let looks_like_json_object = trimmed.len() >= 2
                     && matches!(
                         (trimmed.as_bytes().first(), trimmed.as_bytes().last()),
-                        (Some(b'{'), Some(b'}')) | (Some(b'['), Some(b']'))
+                        (Some(b'{'), Some(b'}'))
                     );
                 // Skip speculative JSON parsing unless the body looks like structured JSON
-                if looks_like_json
+                if looks_like_json_object
                     && let Ok(parsed) = serde_json::from_str::<Value>(s)
                     && parsed.is_object()
                     && let Ok(flattened_values) = generic_flattening(&parsed)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/otel/logs.rs` around lines 151 - 160, The JSON detection logic in the
`looks_like_json` check on line 154 accepts both object and array patterns, but
the subsequent condition on line 159 with `parsed.is_object()` filters to
objects only, meaning array-shaped strings incur unnecessary JSON parsing cost.
To fix this, modify the pattern matching in the `looks_like_json` check to only
accept the object pattern (curly braces) and remove the array pattern (square
brackets) from the `matches!` comparison. This ensures that strings that look
like arrays skip the speculative JSON parsing entirely, avoiding wasted parse
operations on the hot path.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/otel/logs.rs`:
- Around line 224-246: Protect reserved schema metadata fields from being
overwritten by user attributes. The `insert_attributes()` function currently
allows any attribute key without filtering, and since `schema_url` and
`scope_log_schema_url` are inserted before the log-record merge via `.extend()`,
a log attribute with these keys will corrupt the canonical metadata. Apply the
same protection pattern used for `p_log_category` by either filtering out
reserved keys (`schema_url` and `scope_log_schema_url`) in the
`insert_attributes()` function itself, or by re-inserting these canonical values
immediately after the `.extend()` call to ensure they cannot be overwritten.
Additionally, add a regression test that verifies a log attribute named
`schema_url` does not overwrite the resource-level schema URL, ensuring the fix
remains effective.

---

Nitpick comments:
In `@src/otel/logs.rs`:
- Around line 151-160: The JSON detection logic in the `looks_like_json` check
on line 154 accepts both object and array patterns, but the subsequent condition
on line 159 with `parsed.is_object()` filters to objects only, meaning
array-shaped strings incur unnecessary JSON parsing cost. To fix this, modify
the pattern matching in the `looks_like_json` check to only accept the object
pattern (curly braces) and remove the array pattern (square brackets) from the
`matches!` comparison. This ensures that strings that look like arrays skip the
speculative JSON parsing entirely, avoiding wasted parse operations on the hot
path.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 52dbfb94-3538-4193-a5aa-aa615bbfe58a

📥 Commits

Reviewing files that changed from the base of the PR and between ed632b1 and 9655f2f.

📒 Files selected for processing (1)

src/otel/logs.rs

…tening Remove a redundant full-depth clone of JSON maps in `process_resource_spans` by consuming the vector instead of borrowing, allowing direct moves into `Value::Object`. Additionally, refactor `flatten_attributes` and `insert_json_from_value` to pass protobuf values by reference, preventing deep clones of nested arrays and KvLists before serde conversion. Add capacity hints to Vec and Map initializations based on known payload lengths. Profiling (143 requests, ~597k span records): - Hotpath total time: 57.10s → 43.67s (23.5% faster) - Hotpath P95 latency: 742.39ms → 489.68ms (34.0% faster) - Total allocations: 12.2 GB → 8.4 GB (31.1% reduction) - `process_resource_spans` CPU: 5.49% → 2.22% (56.0% reduction)

coderabbitai Bot requested changes Jun 14, 2026

View reviewed changes

Comment thread src/otel/logs.rs

nitisht requested a review from parmesant June 14, 2026 04:49

coderabbitai Bot previously approved these changes Jun 14, 2026

View reviewed changes

ygndotgg dismissed coderabbitai[bot]’s stale review via 9655f2f June 14, 2026 05:00

ygndotgg force-pushed the otel-optimization branch from ed632b1 to 9655f2f Compare June 14, 2026 05:00

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

coderabbitai Bot previously approved these changes Jun 14, 2026

View reviewed changes

ygndotgg dismissed coderabbitai[bot]’s stale review via c425fde June 14, 2026 07:17

coderabbitai Bot previously approved these changes Jun 14, 2026

View reviewed changes

ygndotgg dismissed coderabbitai[bot]’s stale review via 7f6549f June 14, 2026 07:52

ygndotgg force-pushed the otel-optimization branch from c425fde to 7f6549f Compare June 14, 2026 07:52

coderabbitai Bot approved these changes Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Otel optimization#1678

Otel optimization#1678
ygndotgg wants to merge 3 commits into
parseablehq:mainfrom
ygndotgg:otel-optimization

ygndotgg commented Jun 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ygndotgg commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ygndotgg commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading