Skip to content

[feat](Variant) Support NestedGroup public config#64680

Open
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:branch-master-nested-group-public
Open

[feat](Variant) Support NestedGroup public config#64680
eldenmoon wants to merge 1 commit into
apache:masterfrom
eldenmoon:branch-master-nested-group-public

Conversation

@eldenmoon

Copy link
Copy Markdown
Member

cherry-pick #64679

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.

Adds public-facing support for Variant NestedGroup configuration, including SQL parsing/serialization updates and build-time module toggles for BE.

Changes:

  • Allow variant_enable_nested_group property in VARIANT type definitions and serialize it back via toSql().
  • Add/adjust parser and type unit tests to validate NestedGroup property handling and conflicts.
  • Introduce BE build flags and default provider behavior/tests for NestedGroup module enablement.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
regression-test/suites/variant_p0/test_variant_search_subcolumn.groovy Adds context comment documenting the Variant SEARCH subcolumn binding requirement.
fe/fe-type/src/main/java/org/apache/doris/catalog/VariantType.java Serializes variant_enable_nested_group into VARIANT properties when enabled.
fe/fe-core/src/test/java/org/apache/doris/nereids/parser/NereidsParserTest.java Updates tests to accept NestedGroup property and assert conflict behavior with doc mode.
fe/fe-core/src/test/java/org/apache/doris/catalog/TypeTest.java Updates VariantType.toSql() expectations to include NestedGroup property.
fe/fe-core/src/main/java/org/apache/doris/nereids/parser/LogicalPlanBuilder.java Changes NestedGroup handling from “rejected” to “force-disable other variant options”.
build.sh Adds nested_group feature flag visibility in build feature summary.
be/test/storage/segment/nested_group_provider_test.cpp Strengthens default write-provider negative-path coverage and expected errors/messages.
be/test/CMakeLists.txt Adds conditional inclusion of NestedGroup UT sources when module is enabled.
be/src/storage/segment/variant/nested_group_provider.cpp Makes default write provider explicitly return NotSupported instead of OK.
be/src/storage/CMakeLists.txt Swaps in NestedGroup module sources and removes default provider when enabled.
be/src/common/config.cpp Adds variant_nested_group_max_depth and changes default conflict behavior flag.
be/CMakeLists.txt Introduces ENABLE_NESTED_GROUP option and required module dir variable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 5331 to 5337
if (enableNestedGroup) {
throw new NotSupportedException(
"variant_enable_nested_group is not supported now");
enableVariantDocMode = false;
variantMaxSubcolumnsCount = 0;
enableTypedPathsToSparse = false;
variantMaxSparseColumnStatisticsSize = 0;
variantSparseHashShardCount = 0;
}
Comment thread be/test/CMakeLists.txt Outdated
Comment on lines +57 to +61
if (ENABLE_NESTED_GROUP)
file(GLOB_RECURSE NESTED_GROUP_UT_FILES CONFIGURE_DEPENDS
"${CMAKE_CURRENT_SOURCE_DIR}/${NESTED_GROUP_MODULE_DIR}/*.cpp")
list(APPEND UT_FILES ${NESTED_GROUP_UT_FILES})
endif()
Comment thread be/src/common/config.cpp
Comment on lines +1175 to +1179
// Maximum depth of nested arrays to track with NestedGroup
// Reserved for future use when NestedGroup expansion moves to storage layer
// Deeper arrays will be stored as JSONB
DEFINE_mInt32(variant_nested_group_max_depth, "10");
DEFINE_mBool(variant_nested_group_discard_scalar_on_conflict, "true");
@eldenmoon

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29211 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ad26bcd426fd8d369fc7479293898129decdabb7, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17657	4018	4090	4018
q2	2005	313	203	203
q3	10302	1427	820	820
q4	4692	465	342	342
q5	7491	852	595	595
q6	181	171	136	136
q7	783	848	631	631
q8	9327	1495	1560	1495
q9	5864	4530	4547	4530
q10	6758	1804	1539	1539
q11	445	272	245	245
q12	630	425	318	318
q13	18103	3521	2774	2774
q14	270	267	243	243
q15	q16	789	780	706	706
q17	987	944	942	942
q18	7039	5892	5659	5659
q19	1295	1321	1138	1138
q20	504	411	261	261
q21	6075	2712	2315	2315
q22	444	359	301	301
Total cold run time: 101641 ms
Total hot run time: 29211 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4341	4310	4289	4289
q2	345	350	233	233
q3	4646	4986	4401	4401
q4	2057	2142	1396	1396
q5	4456	4343	4318	4318
q6	241	182	135	135
q7	1725	1714	1941	1714
q8	2573	2205	2201	2201
q9	8103	8304	7985	7985
q10	4855	4749	4276	4276
q11	581	417	391	391
q12	782	785	558	558
q13	3226	3648	2989	2989
q14	306	303	282	282
q15	q16	713	722	648	648
q17	1360	1321	1327	1321
q18	7918	7388	7400	7388
q19	1231	1162	1107	1107
q20	2219	2182	1974	1974
q21	5276	4575	4447	4447
q22	525	453	400	400
Total cold run time: 57479 ms
Total hot run time: 52453 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173181 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ad26bcd426fd8d369fc7479293898129decdabb7, data reload: false

query5	4358	623	482	482
query6	438	188	176	176
query7	4862	553	322	322
query8	356	206	197	197
query9	8740	4116	4107	4107
query10	486	304	260	260
query11	5942	2358	2143	2143
query12	162	106	99	99
query13	1301	620	433	433
query14	6397	5425	5090	5090
query14_1	4394	4425	4430	4425
query15	207	198	180	180
query16	987	454	436	436
query17	953	706	583	583
query18	2435	493	363	363
query19	205	194	148	148
query20	109	108	103	103
query21	223	143	118	118
query22	13718	13645	13471	13471
query23	17499	16615	16187	16187
query23_1	16258	16296	16276	16276
query24	7535	1809	1311	1311
query24_1	1345	1339	1343	1339
query25	579	470	407	407
query26	1322	312	171	171
query27	2779	584	328	328
query28	4534	2052	2073	2052
query29	1109	616	521	521
query30	314	245	202	202
query31	1105	1080	964	964
query32	118	67	61	61
query33	539	325	286	286
query34	1193	1159	643	643
query35	747	772	680	680
query36	1359	1391	1216	1216
query37	154	111	90	90
query38	1882	1713	1653	1653
query39	926	923	907	907
query39_1	876	873	864	864
query40	218	121	99	99
query41	64	65	62	62
query42	90	93	87	87
query43	328	326	291	291
query44	1436	780	783	780
query45	195	185	178	178
query46	1054	1214	756	756
query47	2398	2322	2242	2242
query48	398	428	302	302
query49	635	466	345	345
query50	1032	360	263	263
query51	4327	4302	4287	4287
query52	80	82	71	71
query53	250	268	187	187
query54	264	217	194	194
query55	74	69	64	64
query56	230	221	232	221
query57	1455	1364	1288	1288
query58	243	220	207	207
query59	1592	1671	1463	1463
query60	290	229	228	228
query61	153	153	149	149
query62	694	646	594	594
query63	235	200	191	191
query64	2531	802	611	611
query65	4789	4796	4825	4796
query66	1785	474	338	338
query67	29813	29733	29626	29626
query68	3073	1542	969	969
query69	412	315	262	262
query70	1075	979	927	927
query71	300	239	212	212
query72	2930	2615	2280	2280
query73	811	813	448	448
query74	5095	4944	4760	4760
query75	2624	2594	2228	2228
query76	2318	1183	825	825
query77	354	380	279	279
query78	12343	12351	11865	11865
query79	1377	1183	778	778
query80	584	482	372	372
query81	456	278	243	243
query82	564	156	121	121
query83	349	275	248	248
query84	303	141	114	114
query85	872	518	408	408
query86	364	310	279	279
query87	1831	1829	1782	1782
query88	3713	2807	2787	2787
query89	420	392	328	328
query90	1921	190	188	188
query91	177	176	130	130
query92	67	61	58	58
query93	1607	1622	973	973
query94	537	347	301	301
query95	705	395	430	395
query96	1134	824	345	345
query97	2725	2677	2613	2613
query98	213	210	199	199
query99	1176	1136	1033	1033
Total cold run time: 258006 ms
Total hot run time: 173181 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ad26bcd426fd8d369fc7479293898129decdabb7, data reload: false

query1	0.00	0.00	0.00
query2	0.10	0.05	0.05
query3	0.25	0.14	0.13
query4	1.61	0.13	0.12
query5	0.24	0.21	0.22
query6	1.22	1.04	1.07
query7	0.03	0.01	0.00
query8	0.06	0.04	0.04
query9	0.40	0.31	0.31
query10	0.54	0.56	0.54
query11	0.20	0.14	0.15
query12	0.20	0.15	0.14
query13	0.47	0.48	0.47
query14	1.00	1.04	1.01
query15	0.62	0.59	0.60
query16	0.31	0.32	0.31
query17	1.14	1.10	1.09
query18	0.22	0.20	0.21
query19	2.06	2.02	1.95
query20	0.02	0.02	0.01
query21	15.44	0.21	0.14
query22	4.98	0.05	0.05
query23	16.14	0.31	0.12
query24	2.93	0.41	0.31
query25	0.13	0.05	0.04
query26	0.70	0.20	0.16
query27	0.03	0.04	0.04
query28	3.47	0.98	0.56
query29	12.51	4.32	3.43
query30	0.27	0.14	0.17
query31	2.77	0.60	0.31
query32	3.22	0.61	0.49
query33	3.27	3.24	3.21
query34	15.57	4.21	3.55
query35	3.52	3.56	3.52
query36	0.56	0.46	0.43
query37	0.08	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.16
query41	0.09	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 96.72 s
Total hot run time: 25.22 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/5) 🎉
Increment coverage report
Complete coverage report

@eldenmoon

Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon eldenmoon force-pushed the branch-master-nested-group-public branch from ad26bcd to 78c41df Compare June 22, 2026 16:33
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29467 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 78c41dfd70d84ba80d5ec7c17dfb04060d03d614, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17781	4025	4028	4025
q2	2008	320	195	195
q3	10331	1438	852	852
q4	4679	466	343	343
q5	7579	857	572	572
q6	183	174	138	138
q7	810	844	634	634
q8	9347	1659	1622	1622
q9	5910	4551	4518	4518
q10	6767	1761	1555	1555
q11	435	272	247	247
q12	631	429	286	286
q13	18103	3507	2712	2712
q14	266	268	244	244
q15	q16	778	776	708	708
q17	999	1000	1007	1000
q18	6992	5757	5726	5726
q19	1316	1198	1050	1050
q20	502	418	266	266
q21	5976	2551	2472	2472
q22	427	363	302	302
Total cold run time: 101820 ms
Total hot run time: 29467 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4364	4222	4251	4222
q2	340	368	240	240
q3	4685	4994	4447	4447
q4	2076	2148	1391	1391
q5	4497	4305	4330	4305
q6	236	182	131	131
q7	1732	1869	1846	1846
q8	2565	2301	2291	2291
q9	8209	8516	8021	8021
q10	4777	4828	4309	4309
q11	567	454	383	383
q12	743	777	535	535
q13	3233	3595	2984	2984
q14	286	305	286	286
q15	q16	699	741	638	638
q17	1362	1361	1343	1343
q18	7988	7403	7325	7325
q19	1161	1172	1110	1110
q20	2224	2211	1933	1933
q21	5255	4587	4443	4443
q22	520	449	400	400
Total cold run time: 57519 ms
Total hot run time: 52583 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171864 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 78c41dfd70d84ba80d5ec7c17dfb04060d03d614, data reload: false

query5	4311	636	461	461
query6	438	189	174	174
query7	4817	568	314	314
query8	361	220	197	197
query9	8784	4107	4084	4084
query10	429	326	251	251
query11	5890	2350	2110	2110
query12	159	102	97	97
query13	1281	629	410	410
query14	6365	5406	5110	5110
query14_1	4409	4404	4429	4404
query15	207	203	180	180
query16	1022	480	449	449
query17	1113	709	580	580
query18	2564	484	357	357
query19	217	191	147	147
query20	110	111	105	105
query21	223	143	124	124
query22	13609	13615	13416	13416
query23	17334	16578	16143	16143
query23_1	16268	16320	16272	16272
query24	7547	1761	1285	1285
query24_1	1308	1317	1328	1317
query25	602	469	401	401
query26	1336	337	171	171
query27	2606	605	358	358
query28	4436	2057	2028	2028
query29	1082	654	498	498
query30	297	243	199	199
query31	1114	1071	956	956
query32	110	60	61	60
query33	538	324	259	259
query34	1201	1148	680	680
query35	749	811	675	675
query36	1360	1418	1269	1269
query37	152	105	90	90
query38	1901	1711	1599	1599
query39	921	932	886	886
query39_1	864	868	868	868
query40	219	118	98	98
query41	62	60	62	60
query42	86	86	85	85
query43	327	321	283	283
query44	1405	773	778	773
query45	191	176	171	171
query46	1078	1212	752	752
query47	2354	2378	2237	2237
query48	399	436	294	294
query49	637	447	354	354
query50	950	362	274	274
query51	4323	4351	4236	4236
query52	82	81	69	69
query53	250	259	194	194
query54	282	206	197	197
query55	71	68	64	64
query56	219	226	209	209
query57	1418	1397	1314	1314
query58	235	212	211	211
query59	1584	1660	1419	1419
query60	281	248	225	225
query61	145	150	150	150
query62	710	651	591	591
query63	231	186	207	186
query64	2506	768	609	609
query65	4887	4731	4747	4731
query66	1740	443	329	329
query67	29639	28984	29543	28984
query68	3155	1621	958	958
query69	421	309	271	271
query70	1107	972	957	957
query71	284	239	208	208
query72	3016	2641	2328	2328
query73	850	819	451	451
query74	5131	4935	4784	4784
query75	2626	2589	2216	2216
query76	2318	1187	803	803
query77	367	374	275	275
query78	12371	12497	11841	11841
query79	1366	1251	738	738
query80	584	475	374	374
query81	452	280	240	240
query82	569	157	120	120
query83	353	273	247	247
query84	251	144	114	114
query85	859	512	419	419
query86	363	314	299	299
query87	1835	1817	1774	1774
query88	3693	2801	2781	2781
query89	429	380	332	332
query90	1945	188	179	179
query91	170	161	139	139
query92	66	62	53	53
query93	1517	1450	891	891
query94	553	346	320	320
query95	694	475	340	340
query96	1069	841	336	336
query97	2703	2681	2558	2558
query98	212	205	199	199
query99	1217	1190	1030	1030
Total cold run time: 256975 ms
Total hot run time: 171864 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 78c41dfd70d84ba80d5ec7c17dfb04060d03d614, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.26	0.14	0.13
query4	1.60	0.13	0.15
query5	0.24	0.22	0.23
query6	1.28	1.08	1.08
query7	0.04	0.01	0.01
query8	0.06	0.04	0.03
query9	0.37	0.36	0.30
query10	0.56	0.56	0.54
query11	0.18	0.14	0.14
query12	0.18	0.14	0.14
query13	0.48	0.47	0.48
query14	1.02	1.01	1.01
query15	0.60	0.60	0.60
query16	0.32	0.33	0.31
query17	1.08	1.12	1.12
query18	0.23	0.21	0.22
query19	2.04	1.96	1.90
query20	0.02	0.01	0.01
query21	15.41	0.22	0.14
query22	4.75	0.05	0.05
query23	16.14	0.30	0.12
query24	3.00	0.40	0.33
query25	0.11	0.05	0.05
query26	0.73	0.21	0.14
query27	0.03	0.04	0.03
query28	3.55	0.91	0.52
query29	12.50	4.38	3.48
query30	0.27	0.16	0.15
query31	2.77	0.60	0.31
query32	3.22	0.60	0.49
query33	3.21	3.29	3.17
query34	15.54	4.27	3.54
query35	3.53	3.55	3.48
query36	0.56	0.46	0.43
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.02
query40	0.19	0.16	0.14
query41	0.08	0.03	0.04
query42	0.04	0.02	0.02
query43	0.04	0.04	0.04
Total cold run time: 96.5 s
Total hot run time: 25.19 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/5) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.30% (21316/39253)
Line Coverage 37.95% (203688/536793)
Region Coverage 33.94% (159753/470736)
Branch Coverage 34.95% (69972/200214)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.05% (28383/38327)
Line Coverage 58.02% (309603/533641)
Region Coverage 54.91% (259647/472871)
Branch Coverage 56.17% (112608/200465)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.07% (28387/38327)
Line Coverage 58.02% (309627/533641)
Region Coverage 54.93% (259771/472871)
Branch Coverage 56.18% (112623/200465)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/91) 🎉
Increment coverage report
Complete coverage report

@eldenmoon

Copy link
Copy Markdown
Member Author

skip check_coverage

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.06% (28384/38327)
Line Coverage 58.02% (309604/533641)
Region Coverage 54.92% (259690/472871)
Branch Coverage 56.18% (112616/200465)

@eldenmoon

Copy link
Copy Markdown
Member Author

skip check_coverage

@eldenmoon

Copy link
Copy Markdown
Member Author

/review

@eldenmoon

Copy link
Copy Markdown
Member Author

skip check_coverage

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/660) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@eldenmoon

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete. I found three issues that should be fixed before merge.

Critical checkpoint conclusions:

  • Goal and tests: the PR makes NestedGroup configuration public, but the current tests do not prove DDL round-trip correctness or the default create-and-write path.
  • Scope: the code is small, but FE acceptance and BE provider availability are not aligned.
  • Concurrency and lifecycle: no new concurrency, locking, transaction, or lifecycle hazard was found in the changed paths.
  • Configuration: the new BE config is missing the normal public declaration; the existing review thread already covers its missing validator.
  • Compatibility: nested-group VARIANT SQL serialization can produce DDL that the parser rejects, and default BE builds can accept table metadata they cannot write.
  • Parallel paths: FE parser/catalog/thrift/PB propagation and BE default writer paths were checked; the default provider rejects writes.
  • Test coverage: parser/provider tests were added, but they skip or miss the failing round-trip and create-vs-insert boundaries.
  • User focus: no additional user-provided review focus was supplied.

Subagent conclusions: OR-1 became inline comment M-1; TSC-2 and TSC-3 became inline comments M-2 and M-3; TSC-1 was dismissed as duplicate of existing parser conflict thread discussion_r3450648268. Convergence round 2 ended with both subagents reporting NO_NEW_VALUABLE_FINDINGS for this final comment set.

sb.append("\"variant_sparse_hash_shard_count\" = \"")
.append(String.valueOf(Math.max(1, variantSparseHashShardCount))).append("\"");
}
if (enableNestedGroup) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When nested group is enabled the parser zeros variantMaxSparseColumnStatisticsSize above, but this catalog toSql() path still emits that value in the non-doc property set before adding variant_enable_nested_group. A table created with VARIANT<PROPERTIES("variant_enable_nested_group" = "true")> will therefore print variant_max_sparse_column_statistics_size = "0", and re-parsing that SQL fails because PropertyAnalyzer.analyzeVariantMaxSparseColumnStatisticsSize rejects explicit values below 1. Please serialize a nested-group property set that omits the disabled sparse/doc properties, or otherwise keeps all emitted values parser-valid, and add a DDL round-trip test for this type.

if (enableNestedGroup) {
throw new NotSupportedException(
"variant_enable_nested_group is not supported now");
enableVariantDocMode = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now lets FE create a table whose VARIANT column has variant_enable_nested_group=true, but the default BE build still cannot write such a column. FE propagates the flag into tablet schema, and VariantColumnWriterImpl::finalize() calls the nested-group provider whenever the flag is true; the default provider's prepare/prepare_with_built_groups implementations now return NotSupported("NestedGroup write path is not available in this build"). That means a user can successfully create the table and only hit an insert/load failure later, while the new BE tests skip this path when the provider is unavailable. Please either keep DDL/session acceptance gated until the active BE write path is available, provide a safe fallback write behavior, or add an explicit negative end-to-end test that documents this create-vs-insert boundary.

Comment thread be/src/common/config.cpp
// Maximum depth of nested arrays to track with NestedGroup
// Reserved for future use when NestedGroup expansion moves to storage layer
// Deeper arrays will be stored as JSONB
DEFINE_mInt32(variant_nested_group_max_depth, "10");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new config is defined and registered, but it is not declared in config.h next to the other variant configs. DECLARE_mInt32 is what exposes config::... values to BE code outside config.cpp, so any code that tries to use config::variant_nested_group_max_depth through the normal config header will fail to compile even though the runtime config name exists. Please add the matching DECLARE_mInt32(variant_nested_group_max_depth); near the adjacent variant declarations.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review found two additional blocking issues beyond the existing inline threads.

Critical checkpoint conclusions:

  • Goal and tests: the PR makes nested-group VARIANT SQL-reachable, but tests do not cover CTAS/output-schema preservation of an explicit nested-group cast or session-default conflict handling.
  • Scope: the code changes are small, but FE parser/type semantics now expose a storage property to optimizer and schema-derivation paths.
  • Concurrency and lifecycle: no new concurrency, locking, or special lifecycle issue found in the changed paths.
  • Configuration: BE config declaration/validation concerns are already covered by existing threads; a separate session-default propagation issue is commented inline.
  • Compatibility and parallel paths: the default BE write-path boundary is already covered by an existing thread; the optimizer/CTAS path below is a distinct parallel FE path that can lose the property.
  • Data correctness: accepted inline issues can create the wrong VARIANT column properties. No additional storage data-correctness issue was substantiated beyond existing threads.
  • Test/result hygiene: git diff --check was clean. I did not run Doris builds or test suites in this review runner.
  • Observability: no additional observability issue found.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite produced OR-1, accepted as M-1. tests-session-config produced TSC-1, accepted as M-2. Existing config, BE write-support, explicit conflict, and toSql() round-trip concerns were deduplicated against existing review threads. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

if (enableNestedGroup) {
throw new NotSupportedException(
"variant_enable_nested_group is not supported now");
enableVariantDocMode = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that variant_enable_nested_group=true can be parsed into a Nereids VariantType, explicit casts to that type need to remain semantically visible. Today SimplifyCastRule removes CAST(v AS VARIANT<PROPERTIES("variant_enable_nested_group"="true")>) whenever the child is any otherwise-equal VARIANT, because VariantType.equals() and hashCode() do not include enableNestedGroup. In CTAS/schema-from-query paths, the output column type is taken from the rewritten slot via s.getDataType().conversion(), so the cast can be optimized away and the created table loses the requested nested-group property. Please either include enableNestedGroup in Nereids VariantType equality/hash semantics, or otherwise prevent cast simplification from dropping explicit VARIANT property casts, and add a test that covers this through CTAS or output schema derivation.

throw new NotSupportedException(
"variant_enable_nested_group is not supported now");
enableVariantDocMode = false;
variantMaxSubcolumnsCount = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This normalization should validate the final VARIANT property combination after session defaults and explicit properties have both been applied. Right now the validator only sees the raw property map, so conflicts where one side comes from a default are silently rewritten here. For example, with default_variant_enable_nested_group=true, a column declared as VARIANT<PROPERTIES("variant_enable_doc_mode"="true")> reaches this branch with both booleans true and the explicit doc-mode request is forced back to false; with default_variant_enable_doc_mode=true, an explicit nested-group property does the same. A positive explicit variant_max_subcolumns_count is also zeroed when nested group comes from the session default. Please validate enableNestedGroup against the final doc/sparse/subcolumn settings before normalizing them, and add parser tests that cover the session-default cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants