-
Notifications
You must be signed in to change notification settings - Fork 442
[lake] Use file to store lake snapshot #2037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
210cc06 to
e73f2f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors lake table snapshot storage to use a file-based approach instead of storing complete snapshot data in ZooKeeper. The change introduces a version 2 storage format where ZooKeeper stores only the metadata file path while the actual snapshot data is persisted in remote storage, improving scalability and reducing ZK load.
- Introduced
LakeTablewrapper class supporting both legacy (version 1: full data in ZK) and new (version 2: file path in ZK) storage formats - Moved
LakeTableSnapshotandLakeTableSnapshotJsonSerdeto newlakepackage for better organization - Updated
upsertLakeTableSnapshotmethod signature to requireTablePathfor generating remote file paths
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTable.java | New wrapper class for lake table snapshot information supporting both version 1 (legacy) and version 2 (file-based) storage formats |
| fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableJsonSerde.java | New JSON serializer/deserializer for LakeTable with automatic version detection and backward compatibility |
| fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableUtils.java | Utility class for storing lake table snapshots to remote files |
| fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshot.java | Moved from parent package to lake subpackage; license header formatting updated |
| fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java | Moved from parent package to lake subpackage; added static helper methods toJson/fromJson |
| fluss-server/src/main/java/org/apache/fluss/server/zk/data/ZkData.java | Updated LakeTableZNode to encode/decode LakeTable instead of LakeTableSnapshot |
| fluss-server/src/main/java/org/apache/fluss/server/zk/ZooKeeperClient.java | Updated upsertLakeTableSnapshot to accept tablePath parameter and store snapshots to files; added getLakeTable method |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java | Updated to retrieve and pass tablePath when upserting lake table snapshots |
| fluss-common/src/main/java/org/apache/fluss/utils/FlussPaths.java | Added remoteLakeTableSnapshotPath method for generating remote lake snapshot file paths |
| fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java | Updated import to use new lake package location |
| fluss-server/src/main/java/org/apache/fluss/server/utils/ServerRpcMessageUtils.java | Updated import to use new lake package location |
| fluss-server/src/main/java/org/apache/fluss/server/entity/CommitLakeTableSnapshotData.java | Updated import to use new lake package location |
| fluss-server/src/main/java/org/apache/fluss/server/RpcServiceBase.java | Updated import to use new lake package location |
| fluss-server/src/test/java/org/apache/fluss/server/zk/ZooKeeperClientTest.java | Added integration test verifying backward compatibility when upgrading from version 1 to version 2 format |
| fluss-server/src/test/java/org/apache/fluss/server/zk/data/LakeTableSnapshotJsonSerdeTest.java | Updated imports to use new lake package location |
| fluss-server/src/test/java/org/apache/fluss/server/replica/CommitLakeTableSnapshotITCase.java | Updated import to use new lake package location |
| fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/tiering/committer/TieringCommitOperatorTest.java | Updated expected snapshot IDs to reflect multiple commits with incremental IDs |
| fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/tiering/source/enumerator/TieringSourceEnumeratorTest.java | Fixed snapshot ID management to use incremental IDs for each partition commit |
| fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/tiering/TestingLakeTieringFactory.java | Modified testingLakeCommitter field to be mutable to support lazy initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/zk/ZooKeeperClient.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableUtils.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/ZooKeeperClient.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTable.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTable.java
Outdated
Show resolved
Hide resolved
e73f2f8 to
08a2dcb
Compare
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Since the
getLatestLakeSnapshotRPC adds one more remote storage IO, it may blocks TabletService/CoordinatorService requests queue. I suggested to having anioExecutorin TabletServer as well, and move the IO operation into theioExecutorand return the future. - The current serialized json data of
LakeTableSnapshotis not compacted. We should design a new serialization format to reduce the size (avoid duplicated information in the json). - I suggest it's best to consider the DV table issue together, it should involve modifying the same
laketableznode format.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableJsonSerde.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/org/apache/fluss/server/zk/ZooKeeperClientTest.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/org/apache/fluss/utils/FlussPaths.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java
Outdated
Show resolved
Hide resolved
eb52656 to
edefb0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTable.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableHelper.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorServer.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTable.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/org/apache/fluss/server/zk/data/lake/LakeTableHelperTest.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableHelper.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java
Outdated
Show resolved
Hide resolved
edefb0b to
256a8da
Compare
9fd284f to
0902fe0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 45 out of 45 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java
Show resolved
Hide resolved
fluss-common/src/main/java/org/apache/fluss/lake/committer/BucketOffset.java
Show resolved
Hide resolved
fluss-server/src/test/java/org/apache/fluss/server/zk/data/lake/LakeTableHelperTest.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableHelper.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTable.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshotJsonSerde.java
Outdated
Show resolved
Hide resolved
...uss-flink-common/src/test/java/org/apache/fluss/flink/tiering/TestingLakeTieringFactory.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/zk/data/lake/LakeTableSnapshot.java
Show resolved
Hide resolved
…ld contain readable offset and tiered offset
0902fe0 to
6ac2141
Compare
|
@wuchong Hi, I already addressed your comment.
|
Purpose
Linked issue: close #2060
Brief change log
Tests
API and Format
Documentation