Skip to content

Conversation

@jackye1995
Copy link
Contributor

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Code Review

P1 - Missing Test Coverage

The PR adds two new public APIs without tests:

  1. The storage_options_provider parameter for lance.dataset()
  2. The new_file_session() method on LanceDataset

Per the project guidelines, code without tests should not be merged. Please add tests that verify:

  • A custom storage_options_provider takes precedence over namespace-created provider
  • new_file_session() correctly inherits the provider and storage options

P1 - Serialization Concern

LanceDataset uses __reduce__ for pickling (lines 488-499) but _storage_options_provider is not included in the serialization. This means after deserialization:

  • latest_storage_options() won't be able to refresh credentials (relies on Rust-side provider)
  • new_file_session() will create sessions without a provider (_storage_options_provider will be None)

If this is intentional (providers can't be serialized), it should be documented. If not, consider how deserialized datasets should handle credential refresh.


The implementation itself looks correct - precedence logic in __init__.py and the new_file_session() factory method properly delegate to existing infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants