Skip to content

[fix](be) Fix concat_ws nullable array handling#64703

Merged
HappenLee merged 1 commit into
apache:masterfrom
HappenLee:fix-concat-ws-null-array
Jun 23, 2026
Merged

[fix](be) Fix concat_ws nullable array handling#64703
HappenLee merged 1 commit into
apache:masterfrom
HappenLee:fix-concat-ws-null-array

Conversation

@HappenLee

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: concat_ws has a BE execution path for a single array argument. When the array column row itself is NULL, the executor still walked the nested array data and could return values from nested storage instead of treating the NULL array row as empty input. Also, if the optimizer rewrite is disabled, multiple array arguments can reach this BE array path and were silently executed using only the first array argument. This change keeps concat_ws return nullability unchanged, skips nested data for NULL array rows, and rejects array-form concat_ws calls unless the executor receives exactly separator plus one array argument.

Release note

Fix wrong concat_ws results for nullable array inputs and return an error for unsupported multiple-array execution without optimizer rewrite.

Check List (For Author)

  • Test: Regression test and build
    • ./build.sh --be
    • ./run-regression-test.sh --run --conf /tmp/doris-regression-conf-run-path.groovy -d nereids_function_p0/scalar_function -s nereids_scalar_fn_concat_ws -forceGenOut
    • ./run-regression-test.sh --run --conf /tmp/doris-regression-conf-run-path.groovy -d nereids_function_p0/scalar_function -s nereids_scalar_fn_concat_ws
  • Behavior changed: Yes. concat_ws skips nested data for NULL array rows, and unsupported multiple-array BE execution now returns INVALID_ARGUMENT instead of silently consuming only the first array.
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: concat_ws has a BE execution path for a single array argument. When the array column row itself is NULL, the executor still walked the nested array data and could return values from nested storage instead of treating the NULL array row as empty input. Also, if the optimizer rewrite is disabled, multiple array arguments can reach this BE array path and were silently executed using only the first array argument. This change keeps concat_ws return nullability unchanged, skips nested data for NULL array rows, and rejects array-form concat_ws calls unless the executor receives exactly separator plus one array argument.

### Release note

Fix wrong concat_ws results for nullable array inputs and return an error for unsupported multiple-array execution without optimizer rewrite.

### Check List (For Author)

- Test: Regression test and build
    - ./build.sh --be
    - ./run-regression-test.sh --run --conf /tmp/doris-regression-conf-run-path.groovy -d nereids_function_p0/scalar_function -s nereids_scalar_fn_concat_ws -forceGenOut
    - ./run-regression-test.sh --run --conf /tmp/doris-regression-conf-run-path.groovy -d nereids_function_p0/scalar_function -s nereids_scalar_fn_concat_ws
- Behavior changed: Yes. concat_ws skips nested data for NULL array rows, and unsupported multiple-array BE execution now returns INVALID_ARGUMENT instead of silently consuming only the first array.
- Does this need documentation: No
@HappenLee HappenLee force-pushed the fix-concat-ws-null-array branch from 870dbc0 to aa7bf6f Compare June 22, 2026 13:00
@HappenLee

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29245 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aa7bf6f2cad9a4a59467a9262818889bd9596a93, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17706	4065	3993	3993
q2	2005	321	189	189
q3	10311	1517	828	828
q4	4677	483	341	341
q5	7566	895	565	565
q6	184	172	133	133
q7	766	832	607	607
q8	9317	1593	1726	1593
q9	5751	4520	4485	4485
q10	6762	1842	1528	1528
q11	446	284	241	241
q12	631	423	292	292
q13	18097	3390	2792	2792
q14	266	255	239	239
q15	q16	791	773	705	705
q17	912	989	932	932
q18	7321	5864	5663	5663
q19	1349	1378	1122	1122
q20	490	395	264	264
q21	5876	2576	2432	2432
q22	429	373	301	301
Total cold run time: 101653 ms
Total hot run time: 29245 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4358	4252	4256	4252
q2	336	368	229	229
q3	4632	4949	4389	4389
q4	2057	2141	1368	1368
q5	4456	4374	4326	4326
q6	227	175	130	130
q7	1725	1809	1915	1809
q8	2572	2241	2180	2180
q9	8264	8334	7987	7987
q10	4871	4776	4308	4308
q11	572	441	385	385
q12	757	801	531	531
q13	3338	3594	2949	2949
q14	300	306	272	272
q15	q16	752	750	650	650
q17	1358	1343	1323	1323
q18	7795	7375	7380	7375
q19	1197	1124	1150	1124
q20	2221	2227	1937	1937
q21	5250	4522	4393	4393
q22	534	451	403	403
Total cold run time: 57572 ms
Total hot run time: 52320 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173283 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aa7bf6f2cad9a4a59467a9262818889bd9596a93, data reload: false

query5	4339	637	482	482
query6	430	204	175	175
query7	4867	549	307	307
query8	363	210	189	189
query9	8748	4036	4057	4036
query10	460	304	256	256
query11	5999	2312	2134	2134
query12	162	100	99	99
query13	1265	586	457	457
query14	6385	5419	5070	5070
query14_1	4393	4389	4420	4389
query15	208	198	178	178
query16	1014	454	454	454
query17	940	702	591	591
query18	2462	481	351	351
query19	210	202	146	146
query20	111	108	104	104
query21	209	141	122	122
query22	13714	13582	13469	13469
query23	17620	16711	16202	16202
query23_1	16378	16380	16331	16331
query24	7514	1780	1340	1340
query24_1	1332	1329	1341	1329
query25	582	467	385	385
query26	1311	313	171	171
query27	2706	563	339	339
query28	4438	2031	2063	2031
query29	1110	605	474	474
query30	306	240	200	200
query31	1125	1080	964	964
query32	109	60	58	58
query33	516	312	248	248
query34	1188	1183	649	649
query35	761	783	678	678
query36	1395	1345	1253	1253
query37	147	103	82	82
query38	1904	1718	1660	1660
query39	923	922	894	894
query39_1	873	905	887	887
query40	219	126	99	99
query41	78	73	79	73
query42	90	87	85	85
query43	316	320	278	278
query44	1425	780	777	777
query45	190	189	176	176
query46	1102	1228	745	745
query47	2352	2299	2219	2219
query48	385	426	294	294
query49	620	475	348	348
query50	1045	353	260	260
query51	4350	4261	4199	4199
query52	77	83	72	72
query53	250	257	181	181
query54	290	215	192	192
query55	72	69	63	63
query56	225	225	211	211
query57	1457	1395	1290	1290
query58	230	204	203	203
query59	1556	1683	1431	1431
query60	274	248	218	218
query61	176	148	140	140
query62	691	646	589	589
query63	229	187	194	187
query64	2486	775	611	611
query65	4918	4794	4800	4794
query66	1783	459	338	338
query67	29773	29708	29630	29630
query68	3070	1671	954	954
query69	413	295	272	272
query70	1094	939	985	939
query71	289	239	207	207
query72	2976	2600	2285	2285
query73	876	819	434	434
query74	5123	4938	4783	4783
query75	2627	2597	2243	2243
query76	2292	1169	784	784
query77	357	377	280	280
query78	12684	12620	12016	12016
query79	1362	1130	766	766
query80	856	474	430	430
query81	498	286	239	239
query82	589	151	119	119
query83	357	272	245	245
query84	319	142	116	116
query85	900	508	439	439
query86	417	307	286	286
query87	1843	1822	1765	1765
query88	3718	2779	2783	2779
query89	427	368	333	333
query90	1800	187	179	179
query91	171	159	133	133
query92	65	57	53	53
query93	1492	1569	873	873
query94	615	337	266	266
query95	682	376	430	376
query96	1092	841	345	345
query97	2728	2680	2644	2644
query98	221	206	204	204
query99	1170	1181	1039	1039
Total cold run time: 258348 ms
Total hot run time: 173283 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit aa7bf6f2cad9a4a59467a9262818889bd9596a93, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.26	0.14	0.14
query4	1.61	0.15	0.13
query5	0.24	0.23	0.21
query6	1.26	1.05	1.06
query7	0.04	0.00	0.01
query8	0.10	0.04	0.04
query9	0.38	0.32	0.32
query10	0.56	0.57	0.55
query11	0.19	0.14	0.14
query12	0.18	0.14	0.14
query13	0.47	0.48	0.48
query14	1.01	1.03	1.01
query15	0.61	0.59	0.59
query16	0.33	0.33	0.32
query17	1.16	1.09	1.09
query18	0.22	0.21	0.21
query19	2.03	1.93	1.94
query20	0.02	0.01	0.02
query21	15.46	0.23	0.12
query22	4.71	0.05	0.05
query23	16.12	0.31	0.12
query24	2.95	0.46	0.32
query25	0.11	0.06	0.05
query26	0.72	0.21	0.16
query27	0.04	0.04	0.04
query28	3.52	0.91	0.54
query29	12.50	4.33	3.50
query30	0.27	0.15	0.15
query31	2.78	0.62	0.31
query32	3.22	0.60	0.48
query33	3.16	3.26	3.25
query34	15.45	4.19	3.54
query35	3.53	3.48	3.56
query36	0.55	0.44	0.43
query37	0.09	0.06	0.07
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 96.39 s
Total hot run time: 25.33 s

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (11/11) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.00% (28361/38327)
Line Coverage 58.00% (309524/533649)
Region Coverage 54.70% (258657/472872)
Branch Coverage 56.09% (112445/200465)

@HappenLee

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed for PR #64703. I did not find a blocking correctness issue to raise inline.

Critical checkpoint conclusions:

  • Goal/test coverage: The PR fixes nullable array-row handling in BE concat_ws array mode and adds regression coverage for the default rewritten path, a filtered nullable-array case, and the disabled-rewrite error path.
  • Scope: The code change is narrowly limited to BE concat_ws array execution and the matching regression test/output.
  • Concurrency/lifecycle/transactions/persistence: Not involved.
  • Configuration/session behavior: No new config. The existing disable_nereids_expression_rules path was reviewed because the test uses it.
  • Compatibility/parallel paths: Multi-array concat_ws remains supported through the normal Nereids CONCATWS_MULTI_ARRAY_TO_ONE rewrite. The direct BE multi-array shape previously returned wrong data by ignoring later arrays; this PR turns that unsupported shape into an explicit InvalidArgument.
  • Error handling: The new BE arity guard returns Status through the existing function execution path.
  • Nullability/data correctness: The output remains separator-nullability driven, and NULL array rows now skip nested data and produce the existing Doris empty-string behavior.
  • Tests/results/style: The added regression output matches the added queries, uses deterministic ordering where multiple rows are returned, and git diff --check is clean.
  • Performance/observability: No new hot-path concern or observability requirement found for this small branch.

Subagent conclusions:

  • optimizer-rewrite: OPT-NONE; no optimizer/rewrite candidate was substantiated.
  • tests-session-config: proposed TEST-001 about disabled CONCATWS_MULTI_ARRAY_TO_ONE; main review dismissed it with evidence because the disabled-rule direct BE path was already semantically wrong before this PR and now fails explicitly.
  • Convergence round 1: both live subagents replied NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

User focus: No additional user-provided review focus was supplied.

CI note: The listed macOS BE UT failure is from the runner using Java 25 while the job requires JDK 17; compile, Linux BE UT, formatter, CheckStyle, and P0 regression checks shown for the PR are passing or unrelated/pending.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (11/11) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.00% (28361/38327)
Line Coverage 58.00% (309501/533649)
Region Coverage 54.72% (258751/472872)
Branch Coverage 56.09% (112441/200465)

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@HappenLee HappenLee merged commit cebf71c into apache:master Jun 23, 2026
31 of 32 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 23, 2026
Problem Summary: concat_ws has a BE execution path for a single array
argument. When the array column row itself is NULL, the executor still
walked the nested array data and could return values from nested storage
instead of treating the NULL array row as empty input. Also, if the
optimizer rewrite is disabled, multiple array arguments can reach this
BE array path and were silently executed using only the first array
argument. This change keeps concat_ws return nullability unchanged,
skips nested data for NULL array rows, and rejects array-form concat_ws
calls unless the executor receives exactly separator plus one array
argument.

### Release note

Fix wrong concat_ws results for nullable array inputs and return an
error for unsupported multiple-array execution without optimizer
rewrite.
yiguolei pushed a commit that referenced this pull request Jun 23, 2026
…64725)

Cherry-picked from #64703

Co-authored-by: HappenLee <happenlee@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants