chore: Add substr() benchmarks, refactor#20803
chore: Add substr() benchmarks, refactor#20803neilconway wants to merge 3 commits intoapache:mainfrom
substr() benchmarks, refactor#20803Conversation
substr() benchmarks, refactor
| argument( | ||
| name = "start_pos", | ||
| description = "Character position to start the substring at. The first character in the string has a position of 1." | ||
| description = "Character position to start the substring at. The first character in the string has a position of 1. If the start position is less than 1, it is treated as if it is before the start of the string and the (absolute) number of characters before position 1 is subtracted from `length` (if given). For example, `substr('abc', -3, 6)` returns `'ab'`." |
There was a problem hiding this comment.
Is this a new behaviour? As far as I know this doesn't match postgresql's behaviour (as described in the doc on get_true_start_end) where negative start is not allowed. I think for PG the recommended approach there is to use the right(..) function, for duckdb it's string slices.
There was a problem hiding this comment.
@Omega359 Thanks for the comment! This is not new behavior. The behavior for negative start values actually matches what PostgreSQL implements (and what the SQL spec dictates), even though personally that behavior doesn't seem particularly useful to me. get_true_start_end does allow negative start values; maybe you're thinking of negative count values, which are not allowed.
There was a problem hiding this comment.
Wow, ok. That behaviour is seriously weird. I would have expected negative start to be from the end of the string, not invisible characters before the start. TIL something.
Which issue does this PR close?
N/A
Rationale for this change
I'd like to optimize
substrfor scalarstart/countinputs, but the code would benefit from some refactoring and cleanup first. I also added benchmarks forsubstrwith scalar args.What changes are included in this PR?
string_view_substrandstring_substrto use a single loopget_true_start_endto validate its own inputs, cleanup UTF8 pathstartandcountargumentsAre these changes tested?
Yes.
Are there any user-facing changes?
No, other than an error message wording change.
AI usage
Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code.