Make shared string builders fallible end-to-end with try_* APIs#23223
Open
kosiew wants to merge 7 commits into
Open
Make shared string builders fallible end-to-end with try_* APIs#23223kosiew wants to merge 7 commits into
try_* APIs#23223kosiew wants to merge 7 commits into
Conversation
- Enhanced BulkNullStringArrayBuilder to implement a fallible `try_*` contract. - Introduced compatibility wrappers for infallible methods. - Added new methods to StringViewArrayBuilder: - `try_append_byte_map` - `try_append_with` - Implemented fallible long-view part validation and error capture with rollback for the writer. - Included test-only failing bulk builder and added success + overflow tests to ensure robustness.
- Reused actual conversion error in `write_str` and `write_char` - Simplified rollback error path in `try_append_with` - Moved failing test helper types into the tests module - Deduplicated failing test error via `failing_overflow()`
…for improved test reuse - Moved FailingStringWriter and FailingBulkNullStringArrayBuilder to allow for downstream crate module tests to reuse them via `crate::strings::FailingBulkNullStringArrayBuilder`. - Updated visibility to `#[cfg(test)] pub(crate)` in `datafusion/functions/src/strings.rs`.
…in StringViewArrayBuilder
…as pub(crate) for test reuse
- Removed `append_byte_map` and `append_with` from `GenericStringArrayBuilder` - Removed `append_byte_map` and `append_with` from `StringViewArrayBuilder` - Retained trait default methods on `BulkNullStringArrayBuilder` - Updated documentation to fix broken `Self::...` links
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This change moves the shared string builder infrastructure to fallible
try_*APIs so overflow conditions can be propagated asDataFusionErrorinstead of requiring panic-based handling. This provides a consistent error propagation path for downstream UDF migrations while preserving the existing infallible APIs as compatibility wrappers where appropriate.This PR is preparatory work for migrating downstream string UDFs onto fallible append/write APIs end-to-end. The follow-up work will cover direct row emitters such as
chr,uuid,initcap, andsubstr; helper-driven writers such asoverlay,reverse, andtranslate; the largersplit_partmigration, including its index-normalization helpers; and output-amplifying functions such asrepeat,lpad, andrpad, where oversized output must returnDataFusionErrorrather than panic.What changes are included in this PR?
Add shared helpers for validating
StringViewlength, offset, and buffer index values against Arrow'si32::MAXlimits.Introduce fallible
try_*methods throughoutBulkNullStringArrayBuilder:try_append_valuetry_append_placeholdertry_append_withtry_append_byte_mapKeep the existing infallible
append_*methods as compatibility shims that delegate to the correspondingtry_*methods and panic on overflow.Convert
GenericStringArrayBuilder::append_withto reuse the fallible implementation instead of duplicating logic.Refactor
StringViewArrayBuilderto:try_append_withandtry_append_byte_map,Add shared test-only utilities (
FailingBulkNullStringArrayBuilderandFailingStringWriter) to support overflow propagation tests in this and downstream modules.Prepare shared string builder APIs for follow-up UDF migrations covering direct append call sites, helper-driven row writers,
split_partindex handling, and output-amplifying functions such asrepeat,lpad, andrpad.Are these changes tested?
Yes.
This PR adds the following tests:
bulk_try_append_methodsstring_view_builder_try_append_with_and_byte_map_success_pathstring_view_builder_rejects_long_view_part_overflowfailing_bulk_builder_propagates_try_append_errorsIt also continues to exercise existing string builder tests.
Are there any user-facing changes?
No user-facing behavior is intended. This is shared internal infrastructure that enables downstream code to propagate overflow errors through fallible APIs while preserving the existing infallible compatibility methods.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed.