RISC-V: Add RVV vectorized FindMatchLength optimization#233
Open
zhanchangbao-sanechips wants to merge 1 commit intogoogle:mainfrom
Open
RISC-V: Add RVV vectorized FindMatchLength optimization#233zhanchangbao-sanechips wants to merge 1 commit intogoogle:mainfrom
zhanchangbao-sanechips wants to merge 1 commit intogoogle:mainfrom
Conversation
Add vectorized match length computation using RVV instructions. Processes 16 bytes in parallel with __riscv_vle8_v_u8m1 and __riscv_vmsne_vv_u8m1_b8.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds RISC-V Vector (RVV) optimization for the
FindMatchLength()function in the Snappy compression library. The optimization leverages RVV instructions to compare 16 bytes in parallel, resulting in improved compression performance on RISC-V platforms.Motivation
The Snappy compression algorithm spends a significant portion of its time in
FindMatchLength()during the compression phase. On RISC-V platforms with RVV support, we can accelerate this critical path by using vector instructions to perform parallel byte comparisons.Changes Made
FindMatchLength()to process 16-byte blocks in parallel__riscv_vsetvl_e8m1(),__riscv_vle8_v_u8m1(),__riscv_vmsne_vv_u8m1_b8(),__riscv_vfirst_m_b8()Implementation Details
The RVV optimization is strategically placed between
SNAPPY_PREFETCHand the scalar 8-byte loop:This layered approach ensures optimal performance across all input sizes while maintaining code clarity.
Performance Results
Test Environment
ZFlat (Compression) - Key Improvements
Other Operations
Key Observations
Test Repeatability
Three independent test runs confirm consistent and reproducible results:
Stability: 21 out of 24 test cases showed <1% variance across all three runs, indicating high test-retest reliability.
Compatibility and Portability
RISC-V with RVV Support
__riscv && SNAPPY_HAVE_RVVRISC-V without RVV Support
Non-RISC-V Platforms (x86_64, ARM64, etc.)
Testing
snappy_unittestpasses all testssnappy_benchmarkverified on RISC-V hardware (Banana Pi K1)Checklist
Future Work (Out of Scope for This PR)
MemCopy64operations (separate PR)Screenshots
Unit Tests - All Pass
Benchmark - Before Optimization
Benchmark - After Optimization