branch-4.1: [fix](cloud-compaction) prevent EMPTY_CUMULATIVE / BASE-CUMU races on the same tablet #64619#64702
Open
github-actions[bot] wants to merge 1 commit into
Open
branch-4.1: [fix](cloud-compaction) prevent EMPTY_CUMULATIVE / BASE-CUMU races on the same tablet #64619#64702github-actions[bot] wants to merge 1 commit into
github-actions[bot] wants to merge 1 commit into
Conversation
… the same tablet (#64619) Bug --- BE config enable_parallel_cumu_compaction to true. --- On the meta-service, start_compaction_job only rejected a new job when its type strictly equalled an in-flight job's type. This left two races: 1. EMPTY_CUMULATIVE was treated as a different type from CUMULATIVE. While a real CUMULATIVE [v_lo, v_hi] was still running, an EMPTY_CUMULATIVE could be accepted and committed, advancing cumulative_point past v_hi. A subsequent BASE compaction could then pull rowsets in [v_lo, v_hi] as input and race with the in-flight CUMULATIVE on the same rowsets. 2. With check_input_versions_range=true, BASE and CUMULATIVE were never cross-checked against each other, so overlapping input ranges across the two types could be accepted concurrently. Fix --- * Normalize EMPTY_CUMULATIVE to CUMULATIVE for conflict detection so they belong to the same conflict family. * Extend the version-range conflict check to the whole rowset compaction family (BASE / CUMULATIVE / EMPTY_CUMULATIVE / FULL) instead of same-type only. Non-overlapping ranges across types are still allowed. * Keep version_in_compaction notification scoped to the same family so BE retry semantics are unchanged. Behaviour matrix (new -> active, OK = accept, BUSY = JOB_TABLET_BUSY) --------------------------------------------------------------------- before after EMPTY_CUMU vs CUMU OK (race) BUSY CUMU vs EMPTY_CUMU OK BUSY BASE vs CUMU overlap OK (race) BUSY CUMU vs BASE overlap OK (race) BUSY BASE vs CUMU disjoint OK OK (unchanged) same-type / FULL / STOP_TOKEN / idempotent same-id : unchanged Tests ----- * EmptyCumulativeBlockedByCumulativeTest * BaseCumulativeCrossTypeConflictTest The cluster log: ``` 1. start cc -------------- start cc(42326-42474) ------------ RuntimeLogger I20260616 06:05:58.687474 1715 cloud_cumulative_compaction.cpp:111] start CloudCumulativeCompaction, tablet_id=1763693245218, range=[42326-42474]|job_id=7c02be46-86a3-43b7-9687-9c93b1f3affe|input_rowsets=5|input_rows=427170|input_segments=5|input_rowsets_data_size=52916937|input_rowsets_index_size=0|input_rowsets_total_size=52916937|tablet_max_version=42475|cumulative_point=42326|num_rowsets=27|cumu_num_rowsets=6 -------------- meta service record this cc -------------- I20260616 06:05:58.687247 3747094 meta_service_helper.h:174] begin start_tablet_job remote_caller=10.2.18.57:52036 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: CUMULATIVE input_versions: 42326 input_versions: 42474 base_compaction_cnt: 91 cumulative_compaction_cnt: 3590 id: "7c02be46-86a3-43b7-9687-9c93b1f3affe" expiration: 1781647558 lease: 1781561238 check_input_versions_range: true } } request_ip: "10.2.18.57:9050" meta_service.10.2.16.38.INFO:I20260616 06:05:58.688797 3747093 meta_service_job.cpp:272] (1753070360)compaction job to save job={"initiator":"10.2.18.57:9050","type":"CUMULATIVE","input_versions":["42326","42474"],"base_compaction_cnt":"91","cumulative_compaction_cnt":"3590","id":"7c02be46-86a3-43b7-9687-9c93b1f3affe","expiration":"1781647558","lease":"1781561238","check_input_versions_range":true} 2. DELETE trigger the increase of comulative point. --------- meta service record the EMPTY_CUMULATIVE job. The comulative point has become to 42476---------- I20260616 06:05:58.792258 1621135 meta_service_helper.h:174] begin start_tablet_job remote_caller=10.2.18.57:33586 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: EMPTY_CUMULATIVE base_compaction_cnt: 91 cumulative_compaction_cnt: 3590 id: "92277d99-b14f-42df-bfd8-c26e75ff8052" lease: 1781561178 } } request_ip: "10.2.18.57:9050" I20260616 06:05:58.793903 1621138 meta_service_job.cpp:272] (1753070360)compaction job to save job={"initiator":"10.2.18.57:9050","type":"EMPTY_CUMULATIVE","base_compaction_cnt":"91","cumulative_compaction_cnt":"3590","id":"92277d99-b14f-42df-bfd8-c26e75ff8052","lease":"1781561178"} I20260616 06:05:58.796713 1621137 meta_service_helper.h:174] begin finish_tablet_job remote_caller=10.2.18.57:60878 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" action: COMMIT job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: EMPTY_CUMULATIVE input_cumulative_point: 42326 output_cumulative_point: 42476 base_compaction_cnt: 91 cumulative_compaction_cnt: 3590 id: "92277d99-b14f-42df-bfd8-c26e75ff8052" lease: 1781561178 } } request_ip: "10.2.18.57:9050" --------- BE log the EMPTY_CUMULATIVE -------------- RuntimeLogger I20260616 06:05:58.801268 1715 cloud_cumulative_compaction.cpp:533] do empty cumulative compaction to update cumulative point|job_id=92277d99-b14f-42df-bfd8-c26e75ff8052|tablet_id=1763693245218|input_cumulative_point=42326|output_cumulative_point=42476 RuntimeLogger I20260616 06:05:58.801329 1715 cloud_cumulative_compaction.cpp:539] tablet stats=idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } data_size: 25302189990 num_rows: 303319387 num_rowsets: 27 num_segments: 43 base_compaction_cnt: 91 cumulative_compaction_cnt: 3591 cumulative_point: 42476 last_base_compaction_time_ms: 1781539519000 last_cumu_compaction_time_ms: 1781561158000 index_size: 0 segment_size: 25302189990 RuntimeLogger W20260616 06:05:58.801383 1715 cloud_storage_engine.cpp:529] failed to submit compaction task for tablet: 1763693245218, err: [E-2010]cumulative compaction meet delete version 3. start bc -------------- be record the base compaction (2 ~ 42431) RuntimeLogger I20260616 06:06:01.042435 1715 cloud_base_compaction.cpp:84] start CloudBaseCompaction, tablet_id=1763693245218, range=[2-42431]|job_id=a8e92687-2211-4d74-82aa-d6e99c3fc360|input_rowsets=21|input_rows=303315005|input_segments=39|input_rowsets_data_size=25294029764|input_rowsets_index_size=0|input_rowsets_total_size=25294029764 -------------- meta service record the bc -------------- I20260616 06:06:01.043434 3747099 meta_service_helper.h:174] begin start_tablet_job remote_caller=10.2.18.57:52174 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: BASE input_versions: 2 input_versions: 42431 base_compaction_cnt: 91 cumulative_compaction_cnt: 3591 id: "a8e92687-2211-4d74-82aa-d6e99c3fc360" expiration: 1781647561 lease: 1781561241 } } request_ip: "10.2.18.57:9050" RuntimeLogger I20260616 06:06:13.279381 1708 cloud_cumulative_compaction.cpp:208] finish CloudCumulativeCompaction, tablet_id=1763693245218, cost=14587ms, range=[42326-42474]|job_id=7c02be46-86a3-43b7-9687-9c93b1f3affe|input_rowsets=5|input_rows=427170|input_segments=5|input_rowsets_data_size=52916937|input_rowsets_index_size=0|input_rowsets_total_size=52916937|output_rows=427170|output_segments=1|output_rowset_data_size=45147495|output_rowset_index_size=0|output_rowset_total_size=45147495|tablet_max_version=42475|cumulative_point=42476|num_rowsets=23|cumu_num_rowsets=0|local_read_time_us=1407|remote_read_time_us=0|local_read_bytes=5717690|remote_read_bytes=0 4. cc complete and -------------- meta service record the cc and drop rowsets whitch version between 42326 and 42474 RuntimeLogger I20260616 06:06:13.279381 1708 cloud_cumulative_compaction.cpp:208] finish CloudCumulativeCompaction, tablet_id=1763693245218, cost=14587ms, range=[42326-42474]|job_id=7c02be46-86a3-43b7-9687-9c93b1f3affe|input_rowsets=5|input_rows=427170|input_segments=5|input_rowsets_data_size=52916937|input_rowsets_index_size=0|input_rowsets_total_size=52916937|output_rows=427170|output_segments=1|output_rowset_data_size=45147495|output_rowset_index_size=0|output_rowset_total_size=45147495|tablet_max_version=42475|cumulative_point=42476|num_rowsets=23|cumu_num_rowsets=0|local_read_time_us=1407|remote_read_time_us=0|local_read_bytes=5717690|remote_read_bytes=0 I20260616 06:06:13.269352 3747098 meta_service_helper.h:174] begin finish_tablet_job remote_caller=10.2.18.57:52070 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" action: COMMIT job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: CUMULATIVE input_cumulative_point: 42476 output_cumulative_point: 42475 num_input_rowsets: 5 num_input_segments: 5 num_output_rowsets: 1 num_output_segments: 1 size_input_rowsets: 52916937 size_output_rowsets: 45147495 num_input_rows: 427170 num_output_rows: 427170 input_versions: 42326 input_versions: 42474 output_versions: 42474 output_rowset_ids: "020000000008d8252c4a4e54ca9c6c96f347dc1417ad5db8" txn_id: 8054901679135295045 id: "7c02be46-86a3-43b7-9687-9c93b1f3affe" index_size_input_rowsets: 0 segment_size_input_rowsets: 52916937 index_size_output_rowsets: 0 segment_size_output_rowsets: 45147495 } } request_ip: "10.2.18.57:9050" 5. bc complete -------------- bc complete and generate new rs (2-42431)-------------- RuntimeLogger I20260616 06:54:02.973222 1699 cloud_base_compaction.cpp:293] finish CloudBaseCompaction, tablet_id=1763693245218, cost=2881925ms range=[2-42431]|job_id=a8e92687-2211-4d74-82aa-d6e99c3fc360|input_rowsets=21|input_rows=303315005|input_segments=39|input_rowsets_data_size=25294029764|input_rowsets_index_size=0|input_rowsets_total=25294029764|output_rows=303315005|output_segments=24|output_rowset_data_size=25176822135|output_rowset_index_size=0|output_rowset_total_size=25176822135|local_read_time_us=1319383|remote_read_time_us=0|local_read_bytes=1154464868|remote_read_bytes=0 ``` The version (2-42431) generated by bc conficts with the version (42326-42474) gegerated by cc. The compaction info: ```json { "rowsets": [ "[0-1] 0 DATA NONOVERLAPPING 0200000000000000ffffffffffea4868ffffffffffffffe8 0", "[2-42431] 24 DATA NONOVERLAPPING 020000000008d8262c4a4e54ca9c6c96f347dc1417ad5db8 23.45 GB", "[42326-42474] 1 DATA NONOVERLAPPING 020000000008d8252c4a4e54ca9c6c96f347dc1417ad5db8 43.06 MB", "[42475-42475] 0 DELETE OVERLAP_UNKNOWN 020000000008d8242c4a4e54ca9c6c96f347dc1417ad5db8 0" ], "missing_rowsets": [ "[42432-42325]" ] } ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> --------- Co-authored-by: liutang123 <liulijia1029@google.com>
Contributor
|
run buildall |
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #64619