Skip to content

fix(opentelemetry lib): decode missing scope, schema_url, and resource fields#24905

Open
szibis wants to merge 12 commits intovectordotdev:masterfrom
szibis:fix/otlp-decode-missing-fields
Open

fix(opentelemetry lib): decode missing scope, schema_url, and resource fields#24905
szibis wants to merge 12 commits intovectordotdev:masterfrom
szibis:fix/otlp-decode-missing-fields

Conversation

@szibis
Copy link
Contributor

@szibis szibis commented Mar 12, 2026

Summary

The OTLP decode path silently drops several protobuf fields during conversion to Vector events, causing data loss and breaking round-trip fidelity (OTLP → Vector → OTLP). This PR fixes all missing fields and adds metadata preservation for lossless OTLP metric roundtrips.

1. Missing Fields

Before this PR

Field Logs Traces Metrics
scope.name ✅ (tag)
scope.version ✅ (tag)
scope.attributes ✅ (tags)
scope.dropped_attributes_count
ScopeX.schema_url
ResourceX.schema_url
resource.dropped_attributes_count

After this PR

Field Logs Traces Metrics
scope.name ✅ (tag)
scope.version ✅ (tag)
scope.attributes ✅ (tags)
scope.dropped_attributes_count ✅ (tag)
ScopeX.schema_url ✅ (tag)
ResourceX.schema_url ✅ (tag)
resource.dropped_attributes_count ✅ (tag)
start_time_unix_nano (metrics) ✅ (metadata)
Typed attribute preservation (metrics) ✅ (sidecar)

Field mapping

Logs (Legacy / Vector namespace):

  • ScopeLogs.schema_urlscope.schema_url / %opentelemetry.scope.schema_url
  • ResourceLogs.schema_urlschema_url / %opentelemetry.resources.schema_url
  • Resource.dropped_attributes_countresource_dropped_attributes_count / %opentelemetry.resources.dropped_attributes_count

Traces (always at event root):

  • ScopeSpans.scope.*scope.name, scope.version, scope.attributes, scope.dropped_attributes_count
  • ScopeSpans.schema_urlscope.schema_url
  • ResourceSpans.schema_urlschema_url
  • Resource.dropped_attributes_countresource_dropped_attributes_count

Metrics (as tags, following existing resource.* / scope.* pattern):

  • scope_dropped_attributes_count, scope_schema_url, resource_schema_url, resource_dropped_attributes_count

2. start_time_unix_nano Preservation

OTLP metrics carry start_time_unix_nano on every data point, but Vector's metric model has only one timestamp. Previously this was silently dropped.

Now stored in EventMetadata at %vector.otlp.start_time_unix_nano:

3. Typed Metric Attribute Sidecar

OTLP attributes carry typed values (IntValue, BoolValue, DoubleValue), but Vector's MetricTags model stores everything as strings. Previously, all attributes became StringValue on re-encode.

Now stashed in EventMetadata at %vector.otlp.metric_sidecar:

Sidecar Field Content
resource_attributes VRL Object preserving original types via kv_list_into_value()
scope_attributes VRL Object preserving original types
data_point_attributes VRL Object preserving original types
scope_name String
scope_version String
resource_dropped_attributes_count Integer
scope_dropped_attributes_count Integer
tags_fingerprint Hash of stringified tags for staleness detection

Borrow-before-consume pattern: The sidecar borrows &Resource / &InstrumentationScope before build_metric_tags() consumes them, avoiding cloning entire structures.

Staleness detection: The encode side (PR #24897) recomputes the fingerprint from current tags. If tags were mutated by transforms, the sidecar is ignored and the encoder falls back to string-based decomposition.

Scenario Before After
OTLP roundtrip (no transforms) All attrs → StringValue Original types preserved
OTLP after tag-mutating transform StringValue StringValue (correct fallback)
start_time_unix_nano from OTLP Lost (hardcoded 0) Preserved
Native metric (no OTLP source) String tags Same (backward compatible)

Related

Test plan

  • 31 unit tests for missing fields (12 logs, 11 traces, 8 metrics)
  • 5 unit tests for start_time_unix_nano preservation (4 metric types + zero-not-stored)
  • 5 unit tests for typed sidecar (typed resource attrs, typed dp attrs, scope metadata, fingerprint validity, empty sidecar omission)
  • Tests verify both presence of new fields and absence when empty/zero
  • Tests cover Legacy and Vector namespace for logs
  • Combined tests verify all new fields work together with existing fields
  • Integration test with OTLP collector for round-trip verification

@szibis szibis requested a review from a team as a code owner March 12, 2026 08:06
szibis added a commit to szibis/vector that referenced this pull request Mar 12, 2026
- Add changelog fragment for vectordotdev#24905
- Document new log output fields in source CUE: scope.schema_url,
  schema_url (resource-level), resource_dropped_attributes_count
- Add comprehensive trace output field documentation to source CUE,
  including all span fields, scope fields, schema_url, and
  resource_dropped_attributes_count (previously undocumented)
@szibis szibis requested a review from a team as a code owner March 12, 2026 08:14
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Mar 12, 2026
szibis added a commit to szibis/vector that referenced this pull request Mar 12, 2026


Extract scope.schema_url, resource schema_url, resource_dropped_attributes_count,
and scope.dropped_attributes_count in the native-to-OTLP encode path. These fields
are produced by the decode fix in vectordotdev#24905 — the encode now reads them when present
and falls back to defaults (empty/0) when absent, ensuring full round-trip fidelity
once vectordotdev#24905 merges while remaining backward-compatible before it does.

Also fixes schema_url mapping: root "schema_url" now correctly maps to
ResourceLogs/ResourceSpans.schema_url (resource level), while "scope.schema_url"
maps to ScopeLogs/ScopeSpans.schema_url (scope level).
szibis added a commit to szibis/vector that referenced this pull request Mar 12, 2026
… tags

Update decompose_metric_tags to handle 4 special tags as proto-level
structural fields rather than generic attributes:
- resource.dropped_attributes_count → Resource.dropped_attributes_count
- resource.schema_url → ResourceMetrics.schema_url
- scope.dropped_attributes_count → InstrumentationScope.dropped_attributes_count
- scope.schema_url → ScopeMetrics.schema_url

This ensures round-trip fidelity with fix/otlp-decode-missing-fields
(vectordotdev#24905) once merged, while remaining backward-compatible (graceful
defaults of 0 / empty string) before that PR merges.
@szibis szibis changed the title fix(opentelemetry): decode missing scope, schema_url, and resource fields fix(opentelemetry lib): decode missing scope, schema_url, and resource fields Mar 12, 2026
@pront
Copy link
Member

pront commented Mar 12, 2026

Hi @szibis, you have quite a few OTEL PRs open: https://github.com/vectordotdev/vector/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Aopen+author%3Aszibis+

Can you please list the order in which you want me to review them here? Even better, I would mark all but one as draft so I can keep filtering them without need to exchange comments here.

@szibis
Copy link
Contributor Author

szibis commented Mar 12, 2026

Hi @szibis, you have quite a few OTEL PRs open: https://github.com/vectordotdev/vector/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Aopen+author%3Aszibis+

Can you please list the order in which you want me to review them here? Even better, I would mark all but one as draft so I can keep filtering them without need to exchange comments here.

@pront Sorry for that, but I just discovered this gaps and avoiding one big PR addon.

  1. fix(opentelemetry lib): decode missing scope, schema_url, and resource fields #24905 - this PR for decode missing scopes - Full OTLP format baseline for all later PR's
  2. feat(opentelemetry sink): add automatic native log and trace to OTLP conversion #24621 - For auto-convert sink in Logs and Traces
  3. feat(opentelemetry sink): add native metric to OTLP conversion #24897 - Based on Logs and Traces native convert implement Metrics auto convert.

cswatt
cswatt previously approved these changes Mar 12, 2026
@szibis szibis requested a review from pront March 12, 2026 18:44
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new resources.schema_url / resources.dropped_attributes_count field handling introduces a backwards-compatibility issue with Vector namespace logs: they’re written into the same resources object that already contains arbitrary OTLP resource attributes, so valid incoming attributes can now be silently overwritten.

From my local test:

{
  "service.name": "checkout",
  "schema_url": "tenant-defined-value",
  "dropped_attributes_count": "user-payload"
}

With resource metadata:

{
  "schema_url": "https://resource.schema",
  "dropped_attributes_count": 7
}

The emitted event was:

{
  "otel_resources": {
    "service.name": "checkout",
    "schema_url": "https://resource.schema",
    "dropped_attributes_count": 7
  }
}

So the original resource attributes schema_url = "tenant-defined-value" and dropped_attributes_count = "user-payload" were lost.

Repro config:

data_dir: "/tmp/vector-pr-24905-data"

sources:
  otel:
    type: opentelemetry
    use_otlp_decoding: false
    log_namespace: true
    grpc:
      address: "127.0.0.1:43171"
    http:
      address: "127.0.0.1:43181"

transforms:
  expose_meta:
    type: remap
    inputs:
      - otel.logs
    source: |
      .otel_resources = %opentelemetry.resources
      .otel_scope = %opentelemetry.scope
      .otel_timestamp = %opentelemetry.timestamp

sinks:
  out:
    type: console
    inputs:
      - expose_meta
    target: stdout
    encoding:
      codec: json

We should preserve %opentelemetry.resources as the raw user-supplied resource attributes.

Also, we have the same type of issue with Metrics: resource.* / scope.* tag collisions.

Generally, if a field was not literally present as a user attribute, it should not be inserted into the raw attribute map. We should not place synthetic or derived metadata into a namespace that is also used for raw user payload.

@szibis
Copy link
Contributor Author

szibis commented Mar 13, 2026

We should preserve %opentelemetry.resources as the raw user-supplied resource attributes.

Also, we have the same type of issue with Metrics: resource.* / scope.* tag collisions.

Generally, if a field was not literally present as a user attribute, it should not be inserted into the raw attribute map. We should not place synthetic or derived metadata into a namespace that is also used for raw user payload.

@pront All fixed

@szibis szibis requested review from cswatt and pront March 13, 2026 21:03
@szibis
Copy link
Contributor Author

szibis commented Mar 14, 2026

Local Integration & Edge Case Testing

Set up a local test pipeline — Vector OTLP source on ports 4317/4318, piped through an OTLP sink to a second Vector instance on 4319/4320 for full roundtrip testing. Used grpcurl with proto files from lib/opentelemetry-proto/src/proto/opentelemetry-proto/ to send payloads. Initially tried HTTP with curl but Vector's OTLP HTTP endpoint only accepts application/x-protobuf, not JSON, so gRPC was the only option.

That's how I caught the actual bug — passthrough output showed time_unix_nano = 0 stored as Unix epoch (1970-01-01) instead of Null. Per OTLP spec, 0 means "timestamp not set", so this was silently corrupting data. Fixed with a nanos_to_value() helper that treats 0 as unset and guards against u64→i64 overflow past year 2262.

Wrote 15 unit tests in spans.rs, ran with cargo test -p opentelemetry-proto --lib spans — all green, clippy clean. Tests cover the fix plus edge cases like all SpanKind values, multiple events/links/spans, status codes, unicode, and overflow.

Performance

Ran 10k trace events through the roundtrip pipeline in batches of 100 via grpcurl — ~5k events/s, zero errors or panics. These numbers mostly reflect grpcurl client overhead — a compiled gRPC client would be significantly faster. The important takeaway is stability: no crashes, no data corruption, no memory issues under sustained load.

@szibis
Copy link
Contributor Author

szibis commented Mar 14, 2026

Note on testing methodology: For the local integration and performance tests, all 3 PRs (#24905, #24621, #24897) were merged locally into a single test branch (test/otlp-all-prs-merged) and compiled into one binary. This let me test the full decode→encode→decode roundtrip across all signal types together, which is how they'll actually run in production. The unit tests in each PR are self-contained and run independently on their respective branches.

szibis added 5 commits March 15, 2026 15:40
…elds

The OTLP decode path drops several protobuf fields during conversion to
Vector events. This causes silent data loss and breaks round-trip fidelity
when events are later re-encoded to OTLP format.

Fields now decoded:

Logs:
- ScopeLogs.schema_url → scope.schema_url / %opentelemetry.scope.schema_url
- ResourceLogs.schema_url → schema_url / %opentelemetry.resources.schema_url
- Resource.dropped_attributes_count → resource_dropped_attributes_count

Traces:
- ScopeSpans.scope (name, version, attributes, dropped_attributes_count)
- ScopeSpans.schema_url → scope.schema_url
- ResourceSpans.schema_url → schema_url
- Resource.dropped_attributes_count → resource_dropped_attributes_count

Metrics:
- scope.dropped_attributes_count → tag
- ScopeMetrics.schema_url → scope.schema_url tag
- ResourceMetrics.schema_url → resource.schema_url tag
- Resource.dropped_attributes_count → resource.dropped_attributes_count tag

Closes vectordotdev#24904
Relates to vectordotdev#15500
- Add changelog fragment for vectordotdev#24905
- Document new log output fields in source CUE: scope.schema_url,
  schema_url (resource-level), resource_dropped_attributes_count
- Add comprehensive trace output field documentation to source CUE,
  including all span fields, scope fields, schema_url, and
  resource_dropped_attributes_count (previously undocumented)
…ions

Remove redundant .clone() calls in metrics tag building (format! only
borrows), eliminate Value clone for observed_timestamp by keeping it as
DateTime<Utc> (Copy), and remove unnecessary resource.clone() where self
is already consumed by value. Add inline documentation for intentional
Legacy vs Vector namespace path asymmetry on schema_url and
resource_dropped_attributes_count fields.
… resources overwrite

In Vector namespace, the resources insert (kv_list_into_value for
attributes) overwrites the entire "resources" metadata key. Moving
resource_schema_url insert after the resources insert ensures it is
not lost.

Also:
- Add Vector namespace combined test to verify schema_url survives
  alongside resource attributes
- Reformat changelog to 80-100 char lines
…nt passing

Replace the repeated 5-argument pattern (resource, scope, metric_name,
scope_schema_url, resource_schema_url) across all convert_* methods
with a single MetricContext struct. This also removes the per-function
clone boilerplate since ctx is moved into each closure directly.
szibis added 3 commits March 15, 2026 15:41
Per OTLP spec, time_unix_nano == 0 means the timestamp is unset/unknown.
Previously all 5 metric types (Sum, Gauge, Histogram, ExponentialHistogram,
Summary) converted 0 to Some(epoch), which is semantically incorrect.
Now returns None when time_unix_nano is 0, consistent with the existing
log decode behavior.
Move resource_schema_url and resource_dropped_attributes_count to
their own metadata paths instead of nesting them under the "resources"
namespace which holds user-supplied resource attributes.

For logs (Vector namespace): metadata now stored at flat paths like
%opentelemetry.resource_schema_url instead of
%opentelemetry.resources.schema_url, preventing collision when users
have resource attributes named "schema_url" or
"dropped_attributes_count".

For metrics: metadata tags now use underscore-separated names
(resource_schema_url, scope_dropped_attributes_count) instead of
dot-separated (resource.schema_url, scope.dropped_attributes_count)
to avoid colliding with user attribute tags that follow the
"resource.{key}" / "scope.{key}" format.

Also simplifies test section comment separators per review feedback.
…e tests

Per OTLP spec, time_unix_nano == 0 means "unset". Previously spans
decoded 0 as epoch (1970-01-01T00:00:00Z). This adds a nanos_to_value()
helper that returns Value::Null for 0 and safely handles u64→i64
overflow (year 2262+).

Adds 15 new tests covering:
- Zero timestamp decode (span start/end + span events)
- u64::MAX overflow protection
- All span kinds (0, 3, 4, 5)
- Multiple events and links per span
- Multiple spans per scope
- Status variants (unset, error, missing)
- Trace state preservation
- Unicode span names
- Invalid start > end timestamps
@szibis
Copy link
Contributor Author

szibis commented Mar 15, 2026

#24905 (OTLP decode refactor)

#24621 (OTLP encode logs/traces)

#24897 (OTLP encode metrics)

Each PR rebases on top of the previous one after merge.

@pront after each merge I will do the cleanup+rebase to make this flow easiest as possible for Vector Team.

@szibis szibis force-pushed the fix/otlp-decode-missing-fields branch from 0c07143 to cdb8c90 Compare March 15, 2026 14:41
szibis added 2 commits March 15, 2026 16:35
Stash start_time_unix_nano from OTLP data points into metric metadata
at %vector.otlp.start_time_unix_nano during decode. This enables the
encode path to restore the original value on roundtrip instead of
hardcoding 0. Only non-zero values are stored (zero means "not set"
in OTLP).

All 5 metric types updated: Sum, Gauge, Histogram, ExpHistogram, Summary.
…idecar

Preserve original OTLP attribute types (IntValue, BoolValue, DoubleValue)
during metric decode by storing them as VRL Values in EventMetadata at
%vector.otlp.metric_sidecar.

The sidecar includes resource/scope/data-point attributes as typed VRL
objects, scope metadata fields, and a tags fingerprint for staleness
detection on the encode side.
szibis added a commit to szibis/vector that referenced this pull request Mar 15, 2026
Read the typed attribute sidecar from %vector.otlp.metric_sidecar
(stashed by PR vectordotdev#24905 during decode) and emit original OTLP types
(IntValue, BoolValue, DoubleValue) instead of StringValue.

Fingerprint-based staleness detection ensures correctness: if tags
were mutated by transforms, the sidecar is ignored and the encoder
falls back to string-based tag decomposition.
szibis added 2 commits March 15, 2026 18:20
Replace kv_list_into_value() with pb_value_to_typed_value() in
build_otlp_sidecar_data(). Each attribute value is now stored as a
single-key Object named after the OTLP variant (e.g. {"int_value": 42},
{"string_value": "x"}, {"bytes_value": <bytes>}). This preserves the
StringValue/BytesValue distinction and handles ArrayValue/KvlistValue
recursively, so the encoder can reconstruct the exact protobuf variant.
…code tests

Verify the kind-wrapper approach correctly stores BytesValue (distinct
from StringValue), ArrayValue (with recursively wrapped elements), and
KvlistValue (nested key-value structure) in the metric sidecar.
@pront
Copy link
Member

pront commented Mar 16, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4889ee8f19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if start_time_unix_nano > 0 {
metric.metadata_mut().value_mut().insert(
path!("vector", "otlp", "start_time_unix_nano"),
Value::Integer(start_time_unix_nano as i64),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve full fixed64 start_time_unix_nano range

start_time_unix_nano is an OTLP fixed64, but this code stores it as Value::Integer(start_time_unix_nano as i64), which wraps values above i64::MAX into negative numbers. In those inputs (e.g., far-future but still valid fixed64 timestamps), the decoded metadata no longer contains the original start time, so downstream consumers of %vector.otlp.start_time_unix_nano will read a corrupted value instead of the source timestamp.

Useful? React with 👍 / 👎.

@pront
Copy link
Member

pront commented Mar 16, 2026

Checks are failing. I recommend adding the following to your local dev env: https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md?plain=1#L119-L160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: opentelemetry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenTelemetry source: trace decode drops scope, schema_url fields

3 participants