Skip to content

feat: Implement regexp_extract#20811

Open
KonaeAkira wants to merge 1 commit intoapache:mainfrom
KonaeAkira:regexp_extract
Open

feat: Implement regexp_extract#20811
KonaeAkira wants to merge 1 commit intoapache:mainfrom
KonaeAkira:regexp_extract

Conversation

@KonaeAkira
Copy link

@KonaeAkira KonaeAkira commented Mar 8, 2026

Which issue does this PR close?

Rationale for this change

No particular reason. This PR was done as an exercise.

What changes are included in this PR?

Implements the regexp_extract function from pyspark:
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.regexp_extract.html

Are these changes tested?

Tested in regexp_extract.slt.

Are there any user-facing changes?

Yes, a new UDF is made available to the user.

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 8, 2026
@KonaeAkira
Copy link
Author

Should regex flags be added as a 4th (optional) argument?

@KonaeAkira KonaeAkira marked this pull request as ready for review March 8, 2026 22:46
@Omega359
Copy link
Contributor

Omega359 commented Mar 9, 2026

I haven't had a chance to review this PR however I know of a few previous PR's related to this function:

#20308
#19934
#14282

pattern_opts: &impl StringArrayType<'b>,
idx_opts: &Int64Array,
) -> Result<ArrayRef> {
let mut results = GenericStringBuilder::<OffsetSize>::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #20585 perhaps pass in a StringLikeArrayBuilder instead of forcing the return object to utf8/largeutf8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

regexp_extract func from Spark

2 participants