Skip to content

Added support of ApacheArrow compression#599

Merged
alex268 merged 5 commits intoydb-platform:release_v2.4.0from
alex268:release_v2.4.0
Feb 25, 2026
Merged

Added support of ApacheArrow compression#599
alex268 merged 5 commits intoydb-platform:release_v2.4.0from
alex268:release_v2.4.0

Conversation

@alex268
Copy link
Member

@alex268 alex268 commented Feb 25, 2026

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Apache Arrow compression support for query result streaming and for building compressed Arrow batches for bulk upserts, alongside some API refactoring around Arrow bulk-upsert data.

Changes:

  • Add configurable Apache Arrow compression for query result parts (LZ4 frame / Zstd) and a helper handler to decode compressed Arrow record batches.
  • Extend Arrow bulk upsert writer to build compressed batches and rename/move Arrow bulk-upsert payload types.
  • Refactor BulkUpsertData from an interface + impl to a concrete base class, with Arrow payload extending it.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
table/src/test/java/tech/ydb/table/query/arrow/ApacheArrowWriterTest.java Updates Arrow writer tests and adds a compression-related test.
table/src/test/java/tech/ydb/table/integration/ReadTableTest.java Updates bulk upsert construction to new BulkUpsertData API.
table/src/test/java/tech/ydb/table/integration/BulkUpsertTest.java Updates Arrow writer import after package move.
table/src/test/java/tech/ydb/table/integration/AllTypesRecord.java Refactors random record generation and updates Arrow writer import.
table/src/main/java/tech/ydb/table/values/Value.java Fixes Javadoc link formatting for a deprecated method.
table/src/main/java/tech/ydb/table/query/arrow/ApacheArrowWriter.java Adds compressed batch building API and updates Arrow field creation.
table/src/main/java/tech/ydb/table/query/arrow/ApacheArrowData.java Renames/moves Arrow bulk upsert payload and integrates with new BulkUpsertData base class.
table/src/main/java/tech/ydb/table/query/BulkUpsertProtoData.java Removes proto-specific BulkUpsertData implementation (now folded into BulkUpsertData).
table/src/main/java/tech/ydb/table/query/BulkUpsertData.java Converts BulkUpsertData into a concrete class holding proto rows.
table/src/main/java/tech/ydb/table/Session.java Updates default bulk upsert path to use new BulkUpsertData constructor.
table/pom.xml Adds arrow-compression as a test dependency.
query/src/test/java/tech/ydb/query/result/arrow/ArrowValueReaderTest.java Updates Arrow field construction to use FieldType.
query/src/test/java/tech/ydb/query/impl/ApacheArrowTest.java Adds integration tests covering compressed Arrow query results and binary copy scenarios.
query/src/test/java/tech/ydb/query/impl/AllTypesRecord.java Updates Arrow writer import after package move.
query/src/main/java/tech/ydb/query/settings/ExecuteQuerySettings.java Adds ApacheArrowFormatMode to configure Arrow result compression.
query/src/main/java/tech/ydb/query/settings/ApacheArrowFormatMode.java Introduces Arrow compression mode configuration object.
query/src/main/java/tech/ydb/query/result/arrow/CompressedArrowPartsHandler.java Adds handler that can decode compressed Arrow record batches.
query/src/main/java/tech/ydb/query/result/arrow/ArrowPartsHandler.java Adds an overridable loader factory to support compressed loaders.
query/src/main/java/tech/ydb/query/impl/SessionImpl.java Maps Arrow compression settings into gRPC request (ArrowFormatSettings).
query/pom.xml Adds optional arrow-compression dependency for compressed Arrow support.
pom.xml Adds Arrow compression dependency version management.
Comments suppressed due to low confidence (3)

table/src/test/java/tech/ydb/table/query/arrow/ApacheArrowWriterTest.java:74

  • The test method is named zstdCompressionTest, but it creates an LZ4_FRAME codec. This looks like a mismatch (either the test name or the codec type should be changed); also consider asserting that the compressed batch can be deserialized with a compression-aware VectorLoader and yields the original value, not just that the raw bytes differ.
    table/src/main/java/tech/ydb/table/query/arrow/ApacheArrowData.java:20
  • ApacheArrowData has to call super((TypedValue) null), leaving the base BulkUpsertData state invalid. Consider refactoring BulkUpsertData to avoid requiring a nullable rows value for non-proto upsert payloads (e.g., make BulkUpsertData abstract with applyToRequest abstract, or add a protected no-rows constructor).
    table/src/main/java/tech/ydb/table/query/arrow/ApacheArrowWriter.java:1
  • Changing the package of a public API type (tech.ydb.table.query.ApacheArrowWriter -> tech.ydb.table.query.arrow.ApacheArrowWriter) is a breaking change for downstream consumers. If backward compatibility is required, consider leaving a deprecated shim type in the old package that delegates to the new implementation (or at least document this break in the changelog/release notes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alex268 alex268 merged commit 420cfdf into ydb-platform:release_v2.4.0 Feb 25, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants