Skip to content

branch-4.1: [fix](cloud-compaction) prevent EMPTY_CUMULATIVE / BASE-CUMU races on the same tablet #64619#64702

Open
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-64619-branch-4.1
Open

branch-4.1: [fix](cloud-compaction) prevent EMPTY_CUMULATIVE / BASE-CUMU races on the same tablet #64619#64702
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-64619-branch-4.1

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

Cherry-picked from #64619

… the same tablet (#64619)

Bug
---
BE config enable_parallel_cumu_compaction to true.

---
On the meta-service, start_compaction_job only rejected a new job when
its type strictly equalled an in-flight job's type. This left two races:

1. EMPTY_CUMULATIVE was treated as a different type from CUMULATIVE.
While a real CUMULATIVE [v_lo, v_hi] was still running, an
EMPTY_CUMULATIVE could be accepted and committed, advancing
cumulative_point past v_hi. A subsequent BASE compaction could then pull
rowsets in [v_lo, v_hi] as input and race with the in-flight CUMULATIVE
on the same rowsets.
2. With check_input_versions_range=true, BASE and CUMULATIVE were never
cross-checked against each other, so overlapping input ranges across the
two types could be accepted concurrently.

Fix
---
* Normalize EMPTY_CUMULATIVE to CUMULATIVE for conflict detection so
they belong to the same conflict family.
* Extend the version-range conflict check to the whole rowset compaction
family (BASE / CUMULATIVE / EMPTY_CUMULATIVE / FULL) instead of
same-type only. Non-overlapping ranges across types are still allowed.
* Keep version_in_compaction notification scoped to the same family so
BE retry semantics are unchanged.

Behaviour matrix (new -> active, OK = accept, BUSY = JOB_TABLET_BUSY)
---------------------------------------------------------------------
                         before            after
EMPTY_CUMU vs CUMU       OK   (race)       BUSY
CUMU       vs EMPTY_CUMU OK                BUSY
BASE  vs CUMU  overlap   OK   (race)       BUSY
CUMU  vs BASE  overlap   OK   (race)       BUSY
BASE  vs CUMU  disjoint  OK                OK   (unchanged)
same-type / FULL / STOP_TOKEN / idempotent same-id : unchanged

Tests
-----
* EmptyCumulativeBlockedByCumulativeTest
* BaseCumulativeCrossTypeConflictTest

The cluster log:
```
1. start cc
-------------- start cc(42326-42474)  ------------
RuntimeLogger I20260616 06:05:58.687474  1715 cloud_cumulative_compaction.cpp:111] start CloudCumulativeCompaction, tablet_id=1763693245218, range=[42326-42474]|job_id=7c02be46-86a3-43b7-9687-9c93b1f3affe|input_rowsets=5|input_rows=427170|input_segments=5|input_rowsets_data_size=52916937|input_rowsets_index_size=0|input_rowsets_total_size=52916937|tablet_max_version=42475|cumulative_point=42326|num_rowsets=27|cumu_num_rowsets=6

--------------  meta service record this cc --------------
I20260616 06:05:58.687247 3747094 meta_service_helper.h:174] begin start_tablet_job remote_caller=10.2.18.57:52036 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: CUMULATIVE input_versions: 42326 input_versions: 42474 base_compaction_cnt: 91 cumulative_compaction_cnt: 3590 id: "7c02be46-86a3-43b7-9687-9c93b1f3affe" expiration: 1781647558 lease: 1781561238 check_input_versions_range: true } } request_ip: "10.2.18.57:9050"
meta_service.10.2.16.38.INFO:I20260616 06:05:58.688797 3747093 meta_service_job.cpp:272] (1753070360)compaction job to save job={"initiator":"10.2.18.57:9050","type":"CUMULATIVE","input_versions":["42326","42474"],"base_compaction_cnt":"91","cumulative_compaction_cnt":"3590","id":"7c02be46-86a3-43b7-9687-9c93b1f3affe","expiration":"1781647558","lease":"1781561238","check_input_versions_range":true}

2. DELETE trigger the increase of comulative point.
--------- meta service record the EMPTY_CUMULATIVE job. The comulative point has become to 42476----------
I20260616 06:05:58.792258 1621135 meta_service_helper.h:174] begin start_tablet_job remote_caller=10.2.18.57:33586 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: EMPTY_CUMULATIVE base_compaction_cnt: 91 cumulative_compaction_cnt: 3590 id: "92277d99-b14f-42df-bfd8-c26e75ff8052" lease: 1781561178 } } request_ip: "10.2.18.57:9050"
I20260616 06:05:58.793903 1621138 meta_service_job.cpp:272] (1753070360)compaction job to save job={"initiator":"10.2.18.57:9050","type":"EMPTY_CUMULATIVE","base_compaction_cnt":"91","cumulative_compaction_cnt":"3590","id":"92277d99-b14f-42df-bfd8-c26e75ff8052","lease":"1781561178"}
I20260616 06:05:58.796713 1621137 meta_service_helper.h:174] begin finish_tablet_job remote_caller=10.2.18.57:60878 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" action: COMMIT job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: EMPTY_CUMULATIVE input_cumulative_point: 42326 output_cumulative_point: 42476 base_compaction_cnt: 91 cumulative_compaction_cnt: 3590 id: "92277d99-b14f-42df-bfd8-c26e75ff8052" lease: 1781561178 } } request_ip: "10.2.18.57:9050"
--------- BE log the EMPTY_CUMULATIVE --------------
RuntimeLogger I20260616 06:05:58.801268  1715 cloud_cumulative_compaction.cpp:533] do empty cumulative compaction to update cumulative point|job_id=92277d99-b14f-42df-bfd8-c26e75ff8052|tablet_id=1763693245218|input_cumulative_point=42326|output_cumulative_point=42476
RuntimeLogger I20260616 06:05:58.801329  1715 cloud_cumulative_compaction.cpp:539] tablet stats=idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } data_size: 25302189990 num_rows: 303319387 num_rowsets: 27 num_segments: 43 base_compaction_cnt: 91 cumulative_compaction_cnt: 3591 cumulative_point: 42476 last_base_compaction_time_ms: 1781539519000 last_cumu_compaction_time_ms: 1781561158000 index_size: 0 segment_size: 25302189990
RuntimeLogger W20260616 06:05:58.801383  1715 cloud_storage_engine.cpp:529] failed to submit compaction task for tablet: 1763693245218, err: [E-2010]cumulative compaction meet delete version


3. start bc
 -------------- be record the base compaction (2  ~ 42431) 
RuntimeLogger I20260616 06:06:01.042435  1715 cloud_base_compaction.cpp:84] start CloudBaseCompaction, tablet_id=1763693245218, range=[2-42431]|job_id=a8e92687-2211-4d74-82aa-d6e99c3fc360|input_rowsets=21|input_rows=303315005|input_segments=39|input_rowsets_data_size=25294029764|input_rowsets_index_size=0|input_rowsets_total_size=25294029764
-------------- meta service record the bc --------------
I20260616 06:06:01.043434 3747099 meta_service_helper.h:174] begin start_tablet_job remote_caller=10.2.18.57:52174 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: BASE input_versions: 2 input_versions: 42431 base_compaction_cnt: 91 cumulative_compaction_cnt: 3591 id: "a8e92687-2211-4d74-82aa-d6e99c3fc360" expiration: 1781647561 lease: 1781561241 } } request_ip: "10.2.18.57:9050"

RuntimeLogger I20260616 06:06:13.279381  1708 cloud_cumulative_compaction.cpp:208] finish CloudCumulativeCompaction, tablet_id=1763693245218, cost=14587ms, range=[42326-42474]|job_id=7c02be46-86a3-43b7-9687-9c93b1f3affe|input_rowsets=5|input_rows=427170|input_segments=5|input_rowsets_data_size=52916937|input_rowsets_index_size=0|input_rowsets_total_size=52916937|output_rows=427170|output_segments=1|output_rowset_data_size=45147495|output_rowset_index_size=0|output_rowset_total_size=45147495|tablet_max_version=42475|cumulative_point=42476|num_rowsets=23|cumu_num_rowsets=0|local_read_time_us=1407|remote_read_time_us=0|local_read_bytes=5717690|remote_read_bytes=0

4. cc complete and 
-------------- meta service record the cc and drop rowsets whitch version between 42326 and 42474
RuntimeLogger I20260616 06:06:13.279381  1708 cloud_cumulative_compaction.cpp:208] finish CloudCumulativeCompaction, tablet_id=1763693245218, cost=14587ms, range=[42326-42474]|job_id=7c02be46-86a3-43b7-9687-9c93b1f3affe|input_rowsets=5|input_rows=427170|input_segments=5|input_rowsets_data_size=52916937|input_rowsets_index_size=0|input_rowsets_total_size=52916937|output_rows=427170|output_segments=1|output_rowset_data_size=45147495|output_rowset_index_size=0|output_rowset_total_size=45147495|tablet_max_version=42475|cumulative_point=42476|num_rowsets=23|cumu_num_rowsets=0|local_read_time_us=1407|remote_read_time_us=0|local_read_bytes=5717690|remote_read_bytes=0

I20260616 06:06:13.269352 3747098 meta_service_helper.h:174] begin finish_tablet_job remote_caller=10.2.18.57:52070 original_client_ip=10.2.18.57:9050 request=cloud_unique_id: "1:1753070360:jYdIZgSo" action: COMMIT job { idx { table_id: 1753072815281 index_id: 1753072815282 partition_id: 1763693245203 tablet_id: 1763693245218 } compaction { initiator: "10.2.18.57:9050" type: CUMULATIVE input_cumulative_point: 42476 output_cumulative_point: 42475 num_input_rowsets: 5 num_input_segments: 5 num_output_rowsets: 1 num_output_segments: 1 size_input_rowsets: 52916937 size_output_rowsets: 45147495 num_input_rows: 427170 num_output_rows: 427170 input_versions: 42326 input_versions: 42474 output_versions: 42474 output_rowset_ids: "020000000008d8252c4a4e54ca9c6c96f347dc1417ad5db8" txn_id: 8054901679135295045 id: "7c02be46-86a3-43b7-9687-9c93b1f3affe" index_size_input_rowsets: 0 segment_size_input_rowsets: 52916937 index_size_output_rowsets: 0 segment_size_output_rowsets: 45147495 } } request_ip: "10.2.18.57:9050"


5. bc complete
-------------- bc complete and generate new rs (2-42431)--------------
RuntimeLogger I20260616 06:54:02.973222  1699 cloud_base_compaction.cpp:293] finish CloudBaseCompaction, tablet_id=1763693245218, cost=2881925ms range=[2-42431]|job_id=a8e92687-2211-4d74-82aa-d6e99c3fc360|input_rowsets=21|input_rows=303315005|input_segments=39|input_rowsets_data_size=25294029764|input_rowsets_index_size=0|input_rowsets_total=25294029764|output_rows=303315005|output_segments=24|output_rowset_data_size=25176822135|output_rowset_index_size=0|output_rowset_total_size=25176822135|local_read_time_us=1319383|remote_read_time_us=0|local_read_bytes=1154464868|remote_read_bytes=0
```
The version (2-42431) generated by bc conficts with the version
(42326-42474) gegerated by cc.
The compaction info:
```json
{
"rowsets": [
        "[0-1] 0 DATA NONOVERLAPPING 0200000000000000ffffffffffea4868ffffffffffffffe8 0",
        "[2-42431] 24 DATA NONOVERLAPPING 020000000008d8262c4a4e54ca9c6c96f347dc1417ad5db8 23.45 GB",
        "[42326-42474] 1 DATA NONOVERLAPPING 020000000008d8252c4a4e54ca9c6c96f347dc1417ad5db8 43.06 MB",
        "[42475-42475] 0 DELETE OVERLAP_UNKNOWN 020000000008d8242c4a4e54ca9c6c96f347dc1417ad5db8 0"
],
"missing_rowsets": [
        "[42432-42325]"
    ]
}
```

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [x] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

---------

Co-authored-by: liutang123 <liulijia1029@google.com>
@github-actions github-actions Bot requested a review from yiguolei as a code owner June 22, 2026 12:43
@yiguolei

Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 100.00% (21/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 77.28% (1881/2434)
Line Coverage 64.30% (33740/52469)
Region Coverage 64.79% (17381/26828)
Branch Coverage 53.93% (9291/17228)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants