Improve AWS S3 source error logging and diagnostics by saliagadotcom · Pull Request #24959 · vectordotdev/vector

saliagadotcom · 2026-03-18T19:25:53Z

Summary

The AWS S3 source has opaque error logging across several failure modes. Users see generic messages like "Failed to process SQS message: service error" with no AWS error codes, no actionable guidance, and incorrect error type classification (PARSER_FAILED for network/auth errors). The retry backoff loop logs at trace! level, making persistent failures invisible in production.

Changes

Shared AWS error extraction (src/aws/error.rs — new file): Reusable extract_error_context() and classify_error() utilities that pull AWS error codes, HTTP status, request IDs, and dispatch failure kind from any SdkError. Classifies errors into Auth/NotFound/Throttling/Connectivity/Configuration/ServiceError.
S3 GetObject error classification & non-retryable message deletion (sqs.rs, aws_sqs.rs): New S3ObjectGetFailed internal event with actionable messages per error kind (NoSuchKey, AccessDenied, NoSuchBucket, etc.). Non-retryable errors now delete the SQS message when delete_failed_message = true, stopping infinite retry loops for deleted/inaccessible objects. S3ObjectProcessingFailed promoted from debug! to warn! with key and error fields added.
Startup diagnostic logging (mod.rs): Log client_concurrency, compression, multiline, delete_failed_message, and acknowledgements at startup alongside existing fields. Improved Snafu display strings for ConfigMissing and InvalidNumberOfMessages.
WrongRegion error enrichment (sqs.rs): Now includes both the event region and the configured region with remediation guidance.
Deserialization error context (sqs.rs): S3 object bucket/key added to deserialize error warnings. Detects when all frames fail deserialization (0 events produced) and emits a dedicated warning.
SQS operational logging (sqs.rs): Backoff retry log promoted from trace! to warn! with queue_url. Poison message detection via ApproximateReceiveCount > 5. S3 test event promoted from debug! to info!. SNS envelope unwrapping logged at debug!.

Vector configuration

Standard AWS S3 source configuration with SQS-based notifications:

[sources.s3]
type = "aws_s3"
region = "us-east-1"
sqs.queue_url = "https://sqs.us-east-1.amazonaws.com/123456789/my-queue"
sqs.delete_failed_message = true
compression = "gzip"
acknowledgements.enabled = true

How did you test this PR?

make check-clippy
make check-fmt
make test

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

…ing and non-retryable message deletion

…sages

…ssage

…nd detect total parse failures

github-actions · 2026-03-18T19:27:04Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

pront · 2026-03-18T20:59:28Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25b6c6236a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T21:04:49Z

src/sources/aws_s3/sqs.rs

+                                        error_kind,
+                                        actionable_message: &actionable_message,
+                                    });
+                                    self.state.delete_failed_message && !class.is_retryable()


Avoid deleting S3 messages on unclassified client errors

This branch deletes failed S3 notifications whenever delete_failed_message is enabled and class.is_retryable() is false, but the new classifier maps all non-whitelisted 4xx responses to RequestError (and other unmatched cases to Unknown), both treated as non-retryable. That means transient or newly introduced AWS 4xx error codes can now cause immediate SQS message deletion instead of retry, leading to permanent data loss for affected objects. Restrict auto-delete to explicitly terminal errors (for example AccessDenied/NoSuchKey/NoSuchBucket) and keep RequestError/Unknown retryable.

Useful? React with 👍 / 👎.

So a message should only be deleted into two scenarios when successfully processed and when sink rejects the batch and delete_failed_message is enabled. I will revert this.

saliagadotcom added 7 commits March 14, 2026 14:44

feat(aws): add shared error context extraction and classification

de84f60

feat(aws): add S3 GetObject error classification with actionable logg…

4bec56d

…ing and non-retryable message deletion

feat(aws_s3): enrich startup diagnostic logging and improve error mes…

35dc116

…sages

fix(aws_s3 source): include configured region in WrongRegion error me…

d4da98d

…ssage

fix(aws_s3 source): add S3 object context to deserialization errors a…

fe11a6f

…nd detect total parse failures

feat(aws_s3 source): improve SQS operational logging

9cae82f

chore: Add changelog fragment for AWS S3 error logging improvements

25b6c62

github-actions bot added the domain: sources Anything related to the Vector's sources label Mar 18, 2026

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

fix(aws_s3 source): remove accidental SQS message deletion on S3 errors

aa705c9

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve AWS S3 source error logging and diagnostics#24959

Improve AWS S3 source error logging and diagnostics#24959
saliagadotcom wants to merge 8 commits intovectordotdev:masterfrom
saliagadotcom:shend/improve-logging

saliagadotcom commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

pront commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

saliagadotcom Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saliagadotcom commented Mar 18, 2026

Summary

Changes

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

pront commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

saliagadotcom Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants