Skip to content

docs(table-design): add Merge-on-Write concept page; make Unique Key Model task-first#3939

Merged
dataroaring merged 9 commits into
apache:masterfrom
dataroaring:docs-mow-canonical-page
Jun 18, 2026
Merged

docs(table-design): add Merge-on-Write concept page; make Unique Key Model task-first#3939
dataroaring merged 9 commits into
apache:masterfrom
dataroaring:docs-mow-canonical-page

Conversation

@dataroaring

Copy link
Copy Markdown
Contributor

What

  • Add a dedicated Merge-on-Write page under Table Design › Data Model, as the single canonical source for how the merge-on-write and merge-on-read implementations work, their read/write trade-offs, how to choose, and how to enable each.
  • Rewrite the Unique Key Model page to be task-first: it now leads with CREATE TABLE and an upsert example, then links to the new page for the underlying mechanism. The merge-on-write / merge-on-read architecture explanation that previously sat between the intro and the first runnable example has moved to the new page.

Why

On the current Unique Key Model page, a reader who just wants to create a unique-key table has to read the "How It Works" implementation comparison and the update-semantics tables before reaching the first CREATE TABLE. The merge-on-write mechanism is also re-explained inline across many loading and update pages, with no canonical home to link to.

Separating the explanation (merge-on-write) from the how-to (creating and upserting a unique-key table) follows the Diátaxis split: the task page gets you to a runnable example quickly, and the concept page can be linked from everywhere merge-on-write is referenced.

Scope

  • English only.
  • Applies to both the current docs (docs/) and versioned_docs/version-4.x; both sidebars updated.
  • New doc id: table-design/data-model/merge-on-write, placed right after unique in the Table Models category.

Validation

  • version-4.x-sidebars.json parses as valid JSON.
  • Cross-link targets (data-operate/update/update-of-unique-model) and the referenced image exist in both trees.

…Model task-first

Split the merge-on-write / merge-on-read explanation out of the Unique Key
Model page into a dedicated Merge-on-Write page, the single source for how
the two implementations work, their trade-offs, and how to choose.

Rewrite the Unique Key Model page to lead with the task (CREATE TABLE and
upsert first) and link to the new page for the underlying mechanism, instead
of front-loading architecture before the first runnable example.

Applies to both the current docs and version-4.x; sidebars updated in both.
…y Model

Add the 中文 Merge-on-Write page and rewrite the 中文 Unique Key Model page
task-first to match the English changes, linking to Merge-on-Write for the
mechanism. Current docs and version-4.x.
@dataroaring

Copy link
Copy Markdown
Contributor Author

Update: added the 中文 counterparts, so this PR is now bilingual (EN + zh-CN), covering both current and version-4.x. The "English only" note in the description is superseded.

Shorter declarative sentences, active voice (Doris as the actor), and plainer
wording across the Merge-on-Write and Unique Key Model pages. Meaning unchanged.
… comparison

The column duplicated the Query performance row (predicate pushdown is the
reason MoW queries are faster) and introduced an unexplained term in a scan
table. It remains explained in the prose and the capabilities list. EN + zh-CN.
Old rows are not physically removed at write time. Merge-on-write marks the
previous version in a delete bitmap; queries skip marked rows via the bitmap,
and compaction physically removes them later. Fixes the inaccurate 'only the
latest row remains in storage'. EN + zh-CN.
The table showed only Query performance, hiding the write-side cost and making
merge-on-write look strictly better. Add Write performance (MoW: Moderate,
MoR: High) so the table reflects the actual read/write trade-off. EN + zh-CN.
…eads'

Plainer, more common phrasing for the merge-on-read recommendation.
- Drop the predicate-pushdown bullet from 'Capabilities That Require
  Merge-on-Write' (it is a performance behavior already covered in the prose,
  not an opt-in feature); leave partial column update as the one capability.
- Remove 'frequently' from the Unique Key Model opener; the model fits any
  update-by-primary-key workload, not only high-frequency ones. EN + zh-CN.
- MoW: simpler opening sentence, active voice for the property line.
- Unique: upsert behavior stated as load-method-independent (INSERT is just the
  example; Stream/Broker/Routine Load behave the same), and tightened the Notes.
EN + zh-CN.
@dataroaring dataroaring merged commit 60b323e into apache:master Jun 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant