Skip to content

perf: Improve Xor method performance by ~20% for big sets#1

Merged
KernelPryanic merged 1 commit intoKernelPryanic:mainfrom
romshark:main
Feb 16, 2025
Merged

perf: Improve Xor method performance by ~20% for big sets#1
KernelPryanic merged 1 commit intoKernelPryanic:mainfrom
romshark:main

Conversation

@romshark
Copy link
Contributor

@romshark romshark commented Feb 16, 2025

Handling larger bitsets in 8-batches is more efficient on modern CPUs.
I assume it's related to instruction-level parallelism.
This technique can effectively be applied to most bitset methods and functions.

goos: darwin
goarch: arm64
pkg: github.com/KernelPryanic/bitmask
cpu: Apple M1 Max
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
BitSet_Xor/empty-10   2.498n ± 4%   2.493n ± 3%        ~ (p=0.372 n=6)
BitSet_Xor/5-10       2.491n ± 1%   2.492n ± 1%        ~ (p=0.729 n=6)
BitSet_Xor/10k-10     76.10n ± 1%   49.79n ± 1%  -34.57% (p=0.002 n=6)
BitSet_Xor/1m-10      8.453µ ± 0%   5.112µ ± 1%  -39.52% (p=0.002 n=6)
geomean               44.73n        35.46n       -20.72%

                    │   old.txt    │              new.txt               │
                    │     B/op     │    B/op     vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │              new.txt               │
                    │  allocs/op   │ allocs/op   vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Handling larger bitsets in 8-batches is more efficient on modern CPUs.
I assume it's related to instruction-level parallelism.

goos: darwin
goarch: arm64
pkg: github.com/KernelPryanic/bitmask
cpu: Apple M1 Max
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
BitSet_Xor/empty-10   2.498n ± 4%   2.493n ± 3%        ~ (p=0.372 n=6)
BitSet_Xor/5-10       2.491n ± 1%   2.492n ± 1%        ~ (p=0.729 n=6)
BitSet_Xor/10k-10     76.10n ± 1%   49.79n ± 1%  -34.57% (p=0.002 n=6)
BitSet_Xor/1m-10      8.453µ ± 0%   5.112µ ± 1%  -39.52% (p=0.002 n=6)
geomean               44.73n        35.46n       -20.72%

                    │   old.txt    │              new.txt               │
                    │     B/op     │    B/op     vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │              new.txt               │
                    │  allocs/op   │ allocs/op   vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean
@KernelPryanic KernelPryanic merged commit 63daa84 into KernelPryanic:main Feb 16, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants