Skip to content

Conversation

@corwinjoy
Copy link
Contributor

Description

Add support for absolute file paths and full path URIs to hold parquet file names in the delta logstore.

Related Issue(s)

Closes #865

Documentation

These kind of paths occur when creating shallow clones via Java. In addition there may be other cases where we wish to specify the files as full URIs.

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Dec 3, 2025
@corwinjoy
Copy link
Contributor Author

corwinjoy commented Dec 3, 2025

This is a draft PR for discussion.

  1. I have added tests to check for absolute paths under a local file system, S3, Azure and GCP.
  2. Existing tests check for escape characters in paths and escape characters in partitions when optimizing.
  3. A caveat, I used AI to help code this so it is a bit verbose and I may have missed something.
  4. The details get a bit messy here because we have re-encoding both by the object_store::Path and in the JSON serialization of the logstore paths.
  5. Better might be to create some kind of wrapper around object_store::Path to better control operations but it's a bit tough because of the use of structures like ObjectMeta which directly use an object_store::Path. Still, if you see a good way to do this we could try it.
  6. Finally, allowing full file path usage adds a potential security risk so this could use some discussion. I have put this behind a flag for filesystems, this may not be the best approach. Also, there is a similar potential risk for bucket stores.

Overall I think this gives a good picture of what would be needed for full path support.

@corwinjoy
Copy link
Contributor Author

@hntd187 @adamreeve tagging due to related discussion on shallow clones.

// TODO. May be a security risk to allow absolute paths outside the bucket/account.
// When need_bucket_root_store is true, the code registers a bucket-root object
// store that could access ANY object in the bucket, not just those in the
// table directory. This has similar security implications to the file-root store above.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note. I have not added an environment variable to allow this like I did for filesystems. I'm not as sure this is necessary since object stores have more detailed bucket permissions. But, I guess it could still be a problem for delta-rs running as a superuser with malicious delta tables.

@corwinjoy
Copy link
Contributor Author

Also tagging @roeap since he has done some related work in the past:
#3456

@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 5.90551% with 239 lines in your changes missing coverage. Please review.
✅ Project coverage is 26.05%. Comparing base (49f089d) to head (0ae4f81).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/logstore/storage/utils.rs 2.75% 141 Missing ⚠️
crates/core/src/delta_datafusion/table_provider.rs 0.00% 77 Missing ⚠️
crates/core/src/table/mod.rs 0.00% 14 Missing ⚠️
crates/core/src/logstore/mod.rs 50.00% 4 Missing and 1 partial ⚠️
crates/core/src/kernel/snapshot/iterators.rs 85.71% 0 Missing and 1 partial ⚠️
crates/core/src/operations/write/writer.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3963      +/-   ##
==========================================
- Coverage   26.28%   26.05%   -0.23%     
==========================================
  Files         124      124              
  Lines       19839    20058     +219     
  Branches    19839    20058     +219     
==========================================
+ Hits         5214     5226      +12     
- Misses      14254    14460     +206     
- Partials      371      372       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

.add(b'}')
.add(b'%')
// NOTE: '%' is NOT included here to avoid double-encoding.
// Paths already contain valid percent-encoded characters from partition values.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may represent an existing bug in the coding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Absolute path reading

1 participant