Skip to content

WIP: feat(transport): TUS upload for large attachments#1545

Draft
jpnurmi wants to merge 25 commits intojpnurmi/feat/http-retryfrom
jpnurmi/feat/tus
Draft

WIP: feat(transport): TUS upload for large attachments#1545
jpnurmi wants to merge 25 commits intojpnurmi/feat/http-retryfrom
jpnurmi/feat/tus

Conversation

@jpnurmi
Copy link
Collaborator

@jpnurmi jpnurmi commented Feb 27, 2026

  • Large attachments (>=100 MB) are stored as attachment-ref envelope items
  • Database layer stores large attachment files separately in db/attachments/<uuid>/
  • Transport layer streams each attachment-ref to the server via TUS before sending the envelope

See also:

#skip-changelog (wip)

jpnurmi and others added 2 commits February 27, 2026 17:12
Make sentry__envelope_materialize() a public internal function that
callers must invoke explicitly before reading from raw envelopes.
Read-only accessors (get_header, get_event, get_event_id, get_item_count,
get_item, get_transaction) no longer auto-materialize and return null/0
for raw envelopes. Mutating functions (set_header, add_item) still
auto-materialize.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/feat/tus branch 2 times, most recently from da2ce2d to 0946169 Compare March 2, 2026 09:32
@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Mar 2, 2026

@sentry review

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Mar 2, 2026

@cursor review

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Mar 2, 2026

@cursor review

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Mar 2, 2026

@cursor review

}
}
return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug function writes directly to stderr bypassing logging

Low Severity

The new debug_function added as a CURLOPT_DEBUGFUNCTION callback writes HTTP traffic details (headers, JSON bodies) directly to stderr via fprintf, bypassing the SDK's SENTRY_DEBUG/SENTRY_WARN logging infrastructure. Given the PR is titled "WIP", this appears to be development debugging code — it uses non-standard => / <= prefixes and selectively prints JSON-looking payloads, which is useful during development but inconsistent with the rest of the codebase.

Fix in Cursor Fix in Web

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Mar 3, 2026

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: copy_file_range fallback misses common filesystem errors
    • Expanded the copy_file_range fallback condition to also break on EXDEV, EOPNOTSUPP, and EINVAL so the read/write fallback handles these recoverable cases.

Create PR

Or push these changes by commenting:

@cursor push 76476611c4
Preview (76476611c4)
diff --git a/src/path/sentry_path_unix.c b/src/path/sentry_path_unix.c
--- a/src/path/sentry_path_unix.c
+++ b/src/path/sentry_path_unix.c
@@ -362,7 +362,8 @@
             goto done;
         } else if (errno == EAGAIN || errno == EINTR) {
             continue;
-        } else if (errno == ENOSYS) {
+        } else if (errno == ENOSYS || errno == EXDEV || errno == EOPNOTSUPP
+            || errno == EINVAL) {
             break;
         } else {
             rv = 1;
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

rv = 1;
goto done;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_file_range fallback misses common filesystem errors

Medium Severity

The copy_file_range loop only falls through to the read/write fallback on ENOSYS. Other recoverable errors like EXDEV (cross-filesystem, common on Linux < 5.19), EOPNOTSUPP (filesystem doesn't support it), and EINVAL are treated as fatal failures, causing sentry__path_copy to return an error when the read/write loop would have succeeded. Since this function copies user-owned attachments into the database directory, cross-filesystem scenarios are realistic.

Fix in Cursor Fix in Web

jpnurmi and others added 16 commits March 3, 2026 09:58
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add body_path field to prepared HTTP request for streaming file uploads
without loading the entire file into memory. Implement streaming in both
curl (CURLOPT_UPLOAD + read callback) and WinHTTP (WinHttpWriteData
chunked loop) backends.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add location field to HTTP response for TUS protocol support, and
extract http_send_request/http_send_envelope from http_send_task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add attachment_ref content type for large attachments that exceed a size
threshold. Instead of inlining the file contents, the envelope item
stores a JSON payload with the file path for deferred handling by the
transport layer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write large attachment files to a stable db/attachments/<event-id>/
directory during envelope persistence. On restart, resolve attachment-ref
paths so that previously persisted attachments are available for upload.
Avoid materialization in signal handler context for crash safety.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement TUS (resumable upload) protocol for large attachment files.
When an envelope contains attachment-ref items, the transport resolves
each ref by uploading the file via TUS, replacing the ref with the
server-returned URL. Includes 404 auto-disable for servers without TUS
support and cleanup of attachment files after successful upload.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When write_large_attachment converts an inline large attachment to an
attachment-ref, the item payload is raw binary. sentry__value_from_json
returns null for non-JSON data, causing subsequent sentry_value_set_by_key
calls to silently fail and producing "null" as the serialized payload.

Fall back to a fresh object when JSON parsing fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sentry_value_as_string returns "" not NULL for missing values, so the
null check alone would never trigger.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e_attachment

sentry__path_dir can return NULL on allocation failure. Passing NULL to
sentry__path_remove causes a NULL dereference in sentry__path_is_dir.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…chment file

The TUS upload path unconditionally deleted the file after upload via
remove_large_attachment(). For user-supplied attachments in the normal
send path, this destroyed the user's original file. Only temp copies
under db/attachments/ (created by the crash persistence path) should
be cleaned up.

Replace remove_large_attachment() with sentry__db_remove_attachment()
which verifies the file lives under <db_path>/attachments/ before
deleting. Store database_path in the transport state to avoid locking
options on the bgworker thread.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…runs

If sentry__envelope_materialize() fails, the envelope is left in a
partially-deserialized state. Previously this zombie envelope was still
passed to sentry__capture_envelope, sending garbage data while the
crash data was irrecoverably lost. Now free the envelope on failure
so it is not sent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
envelope_add_attachment_ref had a hardcoded "attachment" type, causing
sentry__envelope_add_from_path to silently ignore its type parameter
for large files. Pass type through so the function contract is honored.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ites

Two places manually set item->payload, item->payload_len, and the
"length" header instead of calling the existing helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jpnurmi and others added 6 commits March 3, 2026 10:21
Three places manually free resp.retry_after, resp.x_sentry_rate_limits,
and resp.location. Extract into a helper to avoid divergence when new
response fields are added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TUS requests allocate their own header array with TUS_MAX_HTTP_HEADERS,
so the envelope request MAX_HTTP_HEADERS does not need to be inflated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
resolve_large_attachment called sentry__path_create_dir_all per
attachment item. Move the directory creation into the caller so it
happens once before the loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminates duplicated #ifdef SENTRY_PLATFORM_WINDOWS path-to-string
conversion in envelope_add_attachment_ref and
sentry__envelope_item_set_attachment_ref. Also switches
envelope_add_attachment_ref from hand-rolled jsonwriter to
sentry_value_t approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename sentry__envelope_item_set_attachment_ref to
sentry__envelope_item_set_attachment_ref_path for consistency with the
existing getter. Add sentry__envelope_item_set_attachment_ref_location
in the envelope module, replacing tus_resolve_item in the transport.
This keeps JSON payload manipulation encapsulated in the envelope layer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously the db copy was deleted unconditionally after the TUS upload
attempt, making retries impossible on network or server errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/feat/tus branch from a69528d to 44d6ecc Compare March 3, 2026 09:27
The test creates a file using an absolute path from opts->run->run_path.
When UTF8_TEST_CWD=1, this path contains Thai characters that cannot be
represented in the Windows ANSI codepage, causing fopen() to fail. Use
_wfopen() with the wide-string path on Windows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant