Skip to content

Make shared string builders fallible end-to-end with try_* APIs#23223

Open
kosiew wants to merge 7 commits into
apache:mainfrom
kosiew:refactor-byte-accounting-05-22688
Open

Make shared string builders fallible end-to-end with try_* APIs#23223
kosiew wants to merge 7 commits into
apache:mainfrom
kosiew:refactor-byte-accounting-05-22688

Conversation

@kosiew

@kosiew kosiew commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This change moves the shared string builder infrastructure to fallible try_* APIs so overflow conditions can be propagated as DataFusionError instead of requiring panic-based handling. This provides a consistent error propagation path for downstream UDF migrations while preserving the existing infallible APIs as compatibility wrappers where appropriate.

This PR is preparatory work for migrating downstream string UDFs onto fallible append/write APIs end-to-end. The follow-up work will cover direct row emitters such as chr, uuid, initcap, and substr; helper-driven writers such as overlay, reverse, and translate; the larger split_part migration, including its index-normalization helpers; and output-amplifying functions such as repeat, lpad, and rpad, where oversized output must return DataFusionError rather than panic.

What changes are included in this PR?

  • Add shared helpers for validating StringView length, offset, and buffer index values against Arrow's i32::MAX limits.

  • Introduce fallible try_* methods throughout BulkNullStringArrayBuilder:

    • try_append_value
    • try_append_placeholder
    • try_append_with
    • try_append_byte_map
  • Keep the existing infallible append_* methods as compatibility shims that delegate to the corresponding try_* methods and panic on overflow.

  • Convert GenericStringArrayBuilder::append_with to reuse the fallible implementation instead of duplicating logic.

  • Refactor StringViewArrayBuilder to:

    • validate long-view metadata through shared helpers,
    • add fallible try_append_with and try_append_byte_map,
    • improve spill-path error handling and rollback so intermediate state is restored on failure.
  • Add shared test-only utilities (FailingBulkNullStringArrayBuilder and FailingStringWriter) to support overflow propagation tests in this and downstream modules.

  • Prepare shared string builder APIs for follow-up UDF migrations covering direct append call sites, helper-driven row writers, split_part index handling, and output-amplifying functions such as repeat, lpad, and rpad.

Are these changes tested?

Yes.

This PR adds the following tests:

  • bulk_try_append_methods
  • string_view_builder_try_append_with_and_byte_map_success_path
  • string_view_builder_rejects_long_view_part_overflow
  • failing_bulk_builder_propagates_try_append_errors

It also continues to exercise existing string builder tests.

Are there any user-facing changes?

No user-facing behavior is intended. This is shared internal infrastructure that enables downstream code to propagate overflow errors through fallible APIs while preserving the existing infallible compatibility methods.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed.

kosiew added 5 commits June 28, 2026 17:37
- Enhanced BulkNullStringArrayBuilder to implement a fallible `try_*` contract.
- Introduced compatibility wrappers for infallible methods.
- Added new methods to StringViewArrayBuilder:
- `try_append_byte_map`
- `try_append_with`
- Implemented fallible long-view part validation and error capture with rollback for the writer.
- Included test-only failing bulk builder and added success + overflow tests to ensure robustness.
- Reused actual conversion error in `write_str` and `write_char`
- Simplified rollback error path in `try_append_with`
- Moved failing test helper types into the tests module
- Deduplicated failing test error via `failing_overflow()`
…for improved test reuse

- Moved FailingStringWriter and FailingBulkNullStringArrayBuilder to allow for downstream crate module tests to reuse them via `crate::strings::FailingBulkNullStringArrayBuilder`.
- Updated visibility to `#[cfg(test)] pub(crate)` in `datafusion/functions/src/strings.rs`.
@github-actions github-actions Bot added the functions Changes to functions implementation label Jun 28, 2026
kosiew added 2 commits June 28, 2026 21:14
- Removed `append_byte_map` and `append_with` from `GenericStringArrayBuilder`
- Removed `append_byte_map` and `append_with` from `StringViewArrayBuilder`
- Retained trait default methods on `BulkNullStringArrayBuilder`
- Updated documentation to fix broken `Self::...` links
@kosiew kosiew marked this pull request as ready for review June 28, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant