fix(opentelemetry lib): decode missing scope, schema_url, and resource fields#24905
fix(opentelemetry lib): decode missing scope, schema_url, and resource fields#24905szibis wants to merge 12 commits intovectordotdev:masterfrom
Conversation
- Add changelog fragment for vectordotdev#24905 - Document new log output fields in source CUE: scope.schema_url, schema_url (resource-level), resource_dropped_attributes_count - Add comprehensive trace output field documentation to source CUE, including all span fields, scope fields, schema_url, and resource_dropped_attributes_count (previously undocumented)
Extract scope.schema_url, resource schema_url, resource_dropped_attributes_count, and scope.dropped_attributes_count in the native-to-OTLP encode path. These fields are produced by the decode fix in vectordotdev#24905 — the encode now reads them when present and falls back to defaults (empty/0) when absent, ensuring full round-trip fidelity once vectordotdev#24905 merges while remaining backward-compatible before it does. Also fixes schema_url mapping: root "schema_url" now correctly maps to ResourceLogs/ResourceSpans.schema_url (resource level), while "scope.schema_url" maps to ScopeLogs/ScopeSpans.schema_url (scope level).
… tags Update decompose_metric_tags to handle 4 special tags as proto-level structural fields rather than generic attributes: - resource.dropped_attributes_count → Resource.dropped_attributes_count - resource.schema_url → ResourceMetrics.schema_url - scope.dropped_attributes_count → InstrumentationScope.dropped_attributes_count - scope.schema_url → ScopeMetrics.schema_url This ensures round-trip fidelity with fix/otlp-decode-missing-fields (vectordotdev#24905) once merged, while remaining backward-compatible (graceful defaults of 0 / empty string) before that PR merges.
|
Hi @szibis, you have quite a few OTEL PRs open: https://github.com/vectordotdev/vector/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Aopen+author%3Aszibis+ Can you please list the order in which you want me to review them here? Even better, I would mark all but one as draft so I can keep filtering them without need to exchange comments here. |
@pront Sorry for that, but I just discovered this gaps and avoiding one big PR addon.
|
pront
left a comment
There was a problem hiding this comment.
I think the new resources.schema_url / resources.dropped_attributes_count field handling introduces a backwards-compatibility issue with Vector namespace logs: they’re written into the same resources object that already contains arbitrary OTLP resource attributes, so valid incoming attributes can now be silently overwritten.
From my local test:
{
"service.name": "checkout",
"schema_url": "tenant-defined-value",
"dropped_attributes_count": "user-payload"
}With resource metadata:
{
"schema_url": "https://resource.schema",
"dropped_attributes_count": 7
}The emitted event was:
{
"otel_resources": {
"service.name": "checkout",
"schema_url": "https://resource.schema",
"dropped_attributes_count": 7
}
}So the original resource attributes schema_url = "tenant-defined-value" and dropped_attributes_count = "user-payload" were lost.
Repro config:
data_dir: "/tmp/vector-pr-24905-data"
sources:
otel:
type: opentelemetry
use_otlp_decoding: false
log_namespace: true
grpc:
address: "127.0.0.1:43171"
http:
address: "127.0.0.1:43181"
transforms:
expose_meta:
type: remap
inputs:
- otel.logs
source: |
.otel_resources = %opentelemetry.resources
.otel_scope = %opentelemetry.scope
.otel_timestamp = %opentelemetry.timestamp
sinks:
out:
type: console
inputs:
- expose_meta
target: stdout
encoding:
codec: jsonWe should preserve %opentelemetry.resources as the raw user-supplied resource attributes.
Also, we have the same type of issue with Metrics: resource.* / scope.* tag collisions.
Generally, if a field was not literally present as a user attribute, it should not be inserted into the raw attribute map. We should not place synthetic or derived metadata into a namespace that is also used for raw user payload.
@pront All fixed |
Local Integration & Edge Case TestingSet up a local test pipeline — Vector OTLP source on ports 4317/4318, piped through an OTLP sink to a second Vector instance on 4319/4320 for full roundtrip testing. Used grpcurl with proto files from That's how I caught the actual bug — passthrough output showed Wrote 15 unit tests in PerformanceRan 10k trace events through the roundtrip pipeline in batches of 100 via grpcurl — ~5k events/s, zero errors or panics. These numbers mostly reflect grpcurl client overhead — a compiled gRPC client would be significantly faster. The important takeaway is stability: no crashes, no data corruption, no memory issues under sustained load. |
|
Note on testing methodology: For the local integration and performance tests, all 3 PRs (#24905, #24621, #24897) were merged locally into a single test branch ( |
…elds The OTLP decode path drops several protobuf fields during conversion to Vector events. This causes silent data loss and breaks round-trip fidelity when events are later re-encoded to OTLP format. Fields now decoded: Logs: - ScopeLogs.schema_url → scope.schema_url / %opentelemetry.scope.schema_url - ResourceLogs.schema_url → schema_url / %opentelemetry.resources.schema_url - Resource.dropped_attributes_count → resource_dropped_attributes_count Traces: - ScopeSpans.scope (name, version, attributes, dropped_attributes_count) - ScopeSpans.schema_url → scope.schema_url - ResourceSpans.schema_url → schema_url - Resource.dropped_attributes_count → resource_dropped_attributes_count Metrics: - scope.dropped_attributes_count → tag - ScopeMetrics.schema_url → scope.schema_url tag - ResourceMetrics.schema_url → resource.schema_url tag - Resource.dropped_attributes_count → resource.dropped_attributes_count tag Closes vectordotdev#24904 Relates to vectordotdev#15500
- Add changelog fragment for vectordotdev#24905 - Document new log output fields in source CUE: scope.schema_url, schema_url (resource-level), resource_dropped_attributes_count - Add comprehensive trace output field documentation to source CUE, including all span fields, scope fields, schema_url, and resource_dropped_attributes_count (previously undocumented)
…ions Remove redundant .clone() calls in metrics tag building (format! only borrows), eliminate Value clone for observed_timestamp by keeping it as DateTime<Utc> (Copy), and remove unnecessary resource.clone() where self is already consumed by value. Add inline documentation for intentional Legacy vs Vector namespace path asymmetry on schema_url and resource_dropped_attributes_count fields.
… resources overwrite In Vector namespace, the resources insert (kv_list_into_value for attributes) overwrites the entire "resources" metadata key. Moving resource_schema_url insert after the resources insert ensures it is not lost. Also: - Add Vector namespace combined test to verify schema_url survives alongside resource attributes - Reformat changelog to 80-100 char lines
…nt passing Replace the repeated 5-argument pattern (resource, scope, metric_name, scope_schema_url, resource_schema_url) across all convert_* methods with a single MetricContext struct. This also removes the per-function clone boilerplate since ctx is moved into each closure directly.
Per OTLP spec, time_unix_nano == 0 means the timestamp is unset/unknown. Previously all 5 metric types (Sum, Gauge, Histogram, ExponentialHistogram, Summary) converted 0 to Some(epoch), which is semantically incorrect. Now returns None when time_unix_nano is 0, consistent with the existing log decode behavior.
Move resource_schema_url and resource_dropped_attributes_count to
their own metadata paths instead of nesting them under the "resources"
namespace which holds user-supplied resource attributes.
For logs (Vector namespace): metadata now stored at flat paths like
%opentelemetry.resource_schema_url instead of
%opentelemetry.resources.schema_url, preventing collision when users
have resource attributes named "schema_url" or
"dropped_attributes_count".
For metrics: metadata tags now use underscore-separated names
(resource_schema_url, scope_dropped_attributes_count) instead of
dot-separated (resource.schema_url, scope.dropped_attributes_count)
to avoid colliding with user attribute tags that follow the
"resource.{key}" / "scope.{key}" format.
Also simplifies test section comment separators per review feedback.
…e tests Per OTLP spec, time_unix_nano == 0 means "unset". Previously spans decoded 0 as epoch (1970-01-01T00:00:00Z). This adds a nanos_to_value() helper that returns Value::Null for 0 and safely handles u64→i64 overflow (year 2262+). Adds 15 new tests covering: - Zero timestamp decode (span start/end + span events) - u64::MAX overflow protection - All span kinds (0, 3, 4, 5) - Multiple events and links per span - Multiple spans per scope - Status variants (unset, error, missing) - Trace state preservation - Unicode span names - Invalid start > end timestamps
0c07143 to
cdb8c90
Compare
Stash start_time_unix_nano from OTLP data points into metric metadata at %vector.otlp.start_time_unix_nano during decode. This enables the encode path to restore the original value on roundtrip instead of hardcoding 0. Only non-zero values are stored (zero means "not set" in OTLP). All 5 metric types updated: Sum, Gauge, Histogram, ExpHistogram, Summary.
…idecar Preserve original OTLP attribute types (IntValue, BoolValue, DoubleValue) during metric decode by storing them as VRL Values in EventMetadata at %vector.otlp.metric_sidecar. The sidecar includes resource/scope/data-point attributes as typed VRL objects, scope metadata fields, and a tags fingerprint for staleness detection on the encode side.
Read the typed attribute sidecar from %vector.otlp.metric_sidecar (stashed by PR vectordotdev#24905 during decode) and emit original OTLP types (IntValue, BoolValue, DoubleValue) instead of StringValue. Fingerprint-based staleness detection ensures correctness: if tags were mutated by transforms, the sidecar is ignored and the encoder falls back to string-based tag decomposition.
Replace kv_list_into_value() with pb_value_to_typed_value() in
build_otlp_sidecar_data(). Each attribute value is now stored as a
single-key Object named after the OTLP variant (e.g. {"int_value": 42},
{"string_value": "x"}, {"bytes_value": <bytes>}). This preserves the
StringValue/BytesValue distinction and handles ArrayValue/KvlistValue
recursively, so the encoder can reconstruct the exact protobuf variant.
…code tests Verify the kind-wrapper approach correctly stores BytesValue (distinct from StringValue), ArrayValue (with recursively wrapped elements), and KvlistValue (nested key-value structure) in the metric sidecar.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4889ee8f19
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if start_time_unix_nano > 0 { | ||
| metric.metadata_mut().value_mut().insert( | ||
| path!("vector", "otlp", "start_time_unix_nano"), | ||
| Value::Integer(start_time_unix_nano as i64), |
There was a problem hiding this comment.
Preserve full fixed64 start_time_unix_nano range
start_time_unix_nano is an OTLP fixed64, but this code stores it as Value::Integer(start_time_unix_nano as i64), which wraps values above i64::MAX into negative numbers. In those inputs (e.g., far-future but still valid fixed64 timestamps), the decoded metadata no longer contains the original start time, so downstream consumers of %vector.otlp.start_time_unix_nano will read a corrupted value instead of the source timestamp.
Useful? React with 👍 / 👎.
|
Checks are failing. I recommend adding the following to your local dev env: https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md?plain=1#L119-L160 |
Summary
The OTLP decode path silently drops several protobuf fields during conversion to Vector events, causing data loss and breaking round-trip fidelity (OTLP → Vector → OTLP). This PR fixes all missing fields and adds metadata preservation for lossless OTLP metric roundtrips.
1. Missing Fields
Before this PR
scope.namescope.versionscope.attributesscope.dropped_attributes_countScopeX.schema_urlResourceX.schema_urlresource.dropped_attributes_countAfter this PR
scope.namescope.versionscope.attributesscope.dropped_attributes_countScopeX.schema_urlResourceX.schema_urlresource.dropped_attributes_countstart_time_unix_nano(metrics)Field mapping
Logs (Legacy / Vector namespace):
ScopeLogs.schema_url→scope.schema_url/%opentelemetry.scope.schema_urlResourceLogs.schema_url→schema_url/%opentelemetry.resources.schema_urlResource.dropped_attributes_count→resource_dropped_attributes_count/%opentelemetry.resources.dropped_attributes_countTraces (always at event root):
ScopeSpans.scope.*→scope.name,scope.version,scope.attributes,scope.dropped_attributes_countScopeSpans.schema_url→scope.schema_urlResourceSpans.schema_url→schema_urlResource.dropped_attributes_count→resource_dropped_attributes_countMetrics (as tags, following existing
resource.*/scope.*pattern):scope_dropped_attributes_count,scope_schema_url,resource_schema_url,resource_dropped_attributes_count2.
start_time_unix_nanoPreservationOTLP metrics carry
start_time_unix_nanoon every data point, but Vector's metric model has only one timestamp. Previously this was silently dropped.Now stored in
EventMetadataat%vector.otlp.start_time_unix_nano:stash_start_time()during decoderesolve_start_time()3. Typed Metric Attribute Sidecar
OTLP attributes carry typed values (
IntValue,BoolValue,DoubleValue), but Vector'sMetricTagsmodel stores everything as strings. Previously, all attributes becameStringValueon re-encode.Now stashed in
EventMetadataat%vector.otlp.metric_sidecar:resource_attributeskv_list_into_value()scope_attributesdata_point_attributesscope_namescope_versionresource_dropped_attributes_countscope_dropped_attributes_counttags_fingerprintBorrow-before-consume pattern: The sidecar borrows
&Resource/&InstrumentationScopebeforebuild_metric_tags()consumes them, avoiding cloning entire structures.Staleness detection: The encode side (PR #24897) recomputes the fingerprint from current tags. If tags were mutated by transforms, the sidecar is ignored and the encoder falls back to string-based decomposition.
StringValueStringValueStringValue(correct fallback)start_time_unix_nanofrom OTLPRelated
Test plan
start_time_unix_nanopreservation (4 metric types + zero-not-stored)