tr: add ASCII range translation fast path by parasol-aser · Pull Request #12118 · uutils/coreutils

parasol-aser · 2026-05-01T23:30:19Z

What

Adds a narrow fast path for bytewise ASCII range translations such as:

tr 'a-z' 'A-Z'

The change detects translation tables that modify one contiguous ASCII range by a constant wrapping delta, then processes that range with an AVX2 range compare plus masked add on x86/x86_64 hosts that support AVX2. Other translations continue to use the existing single-byte or table-lookup paths, and non-AVX2 hosts use the scalar fallback.

Why

Before this change, tr 'a-z' 'A-Z' mapped every byte through a scalar 256-byte translation table. This is a common case, and it can be handled more directly by checking whether each byte falls within the translated ASCII range and adding the fixed delta.

Measurements

Environment:

CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OS: Linux x86_64
Rust: rustc 1.92.0
Candidate branch: perf/P002
Baseline commit: 4b5a2af7a916910bfeaf46b298a963d8a038565a
hyperfine was not installed, so this used /usr/bin/time, 2 warmups, and 12 measured runs.

Input was corpus/large_text.txt repeated 16 times, 1,342,178,256 bytes total.

/usr/bin/time -f '%e %M' ./runs/P002-rerun/bin/tr-baseline 'a-z' 'A-Z' < runs/P002-rerun/input/large_text_x16.txt > /dev/null
/usr/bin/time -f '%e %M' ./runs/P002-rerun/bin/tr-p002 'a-z' 'A-Z' < runs/P002-rerun/input/large_text_x16.txt > /dev/null
/usr/bin/time -f '%e %M' /usr/bin/tr 'a-z' 'A-Z' < runs/P002-rerun/input/large_text_x16.txt > /dev/null

implementation	mean wall time	stddev	throughput
uutils baseline	1.068 s	0.129 s	1198.1 MiB/s
uutils candidate	0.421 s	0.029 s	3041.6 MiB/s
GNU tr	1.251 s	0.209 s	1023.3 MiB/s

The candidate is 2.54x faster than the uutils baseline on this workload, a 60.6% wall-time reduction. The earlier 80 MiB pipeline benchmark also showed a 53.8% reduction, from 0.065 s to 0.030 s.

Correctness

For the 1.3 GB input, baseline uutils, candidate uutils, and GNU tr produced the same transformed output SHA256:

6f2d6cb371ca0b423a90a5690ee7f6dac0be6a7d889f308ff5b15f2957e853db

Tests

cargo test --release --test tests -- --nocapture --test-threads=1 test_tr::test_ascii_range_translate_alignment_boundaries
cargo clippy --release -p uu_tr -- -D warnings
cargo fmt --check --package uu_tr

The regression test covers 0, 1, 31, 32, and 33 byte inputs around the AVX2 lane width, all byte values, and a UTF-8/non-ASCII boundary case, with GNU parity when GNU tr is available.

Caveats

The speedup is from the AVX2 path on this x86_64 host. Non-AVX2 targets use the scalar fallback and should be behavior-preserving, but I did not benchmark those targets here.

Files: src/uu/tr/src/operation.rs, src/uu/tr/src/simd.rs. Mechanism: detect translation tables that change one contiguous ASCII range by a constant wrapping delta, then process those chunks with an AVX2 range-compare and masked add kernel with scalar fallback. The existing single-byte and table-lookup paths remain for non-affine translations. Predicted delta: tr/tr_lower_to_upper_large_text_stdout_discarded should improve by 10-20% versus the 0.065s baseline on AVX2 hosts.

Covers ASCII range translation for a-z to A-Z at input lengths 0, 1, 31, 32, and 33 around the AVX2 lane width, plus all byte values and a UTF-8 boundary/non-ASCII case. Assertions cover exit code success, empty stderr, byte-exact stdout, and GNU tr parity when a GNU tr binary is available on PATH. Test command used in this repo: cargo test --release --test tests -- --nocapture --test-threads=1 test_tr::test_ascii_range_translate_alignment_boundaries. The requested package-scoped command cargo test --release -p uu_tr --test test_tr -- --nocapture --test-threads=1 test_ascii_range_translate_alignment_boundaries is unavailable because uu_tr has no test_tr target.

github-actions · 2026-05-02T04:58:00Z

GNU testsuite comparison:

Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Note: The gnu test tests/cp/link-heap is now being skipped but was previously passing.
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.
Note: The gnu test tests/env/env-signal-handler was skipped on 'main' but is now failing.

The intrinsics blendv, cmpgt, loadu, and storeu introduced in the AVX2 range translation kernel are flagged by cspell. Annotate the file so the style/spelling CI job stays green.

parasol-aser · 2026-05-03T02:09:18Z

You need to add spell-checker: ignore

@oech3 added thanks

sylvestre · 2026-05-07T07:10:40Z

could you please write a benchmark that can be executed with codspeed
and run the benchmark with hyperfine and not time? thanks

parasol-aser · 2026-05-07T13:33:03Z

@sylvestre done in ffc64b6:

codspeed bench at src/uu/tr/benches/tr_bench.rs (divan), uu_tr added to .github/workflows/benchmarks.yml. Run locally: cargo bench -p uu_tr.
Re-ran with hyperfine (3 warmups, 20 runs) on the 1.3 GB input:

Command	Mean [s]	Throughput	Relative
uutils baseline	1.023 ± 0.029	1251 MiB/s	2.24×
uutils candidate	0.456 ± 0.028	2807 MiB/s	1.00×
GNU tr	1.087 ± 0.019	1177 MiB/s	2.38×

PR description updated with the full table and bench details.

oech3 · 2026-05-07T13:53:27Z

Would you split PR adding benchmark without unused import?

parasol-aser · 2026-05-07T14:10:40Z

@oech3 split per your suggestion: bench moved to #12189 (against main, fixed the unused-import that the Windows clippy job was flagging by gating the uses with the same #[cfg(unix)] as the bench fns). This PR is now perf-only.

@sylvestre — codspeed coverage will land via #12189.

jeffhuang added 2 commits May 1, 2026 22:36

parasol-aser marked this pull request as ready for review May 2, 2026 03:12

This comment was marked as resolved.

Sign in to view

tr: add spell-checker ignore for AVX2 intrinsic names

14453fe

The intrinsics blendv, cmpgt, loadu, and storeu introduced in the AVX2 range translation kernel are flagged by cspell. Annotate the file so the style/spelling CI job stays green.

parasol-aser force-pushed the perf/P002 branch from ffc64b6 to 14453fe Compare May 7, 2026 14:03

parasol-aser mentioned this pull request May 7, 2026

tr: add codspeed benchmark #12189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tr: add ASCII range translation fast path#12118

tr: add ASCII range translation fast path#12118
parasol-aser wants to merge 3 commits into
uutils:mainfrom
parasol-aser:perf/P002

parasol-aser commented May 1, 2026

Uh oh!

github-actions Bot commented May 2, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

parasol-aser commented May 3, 2026

Uh oh!

sylvestre commented May 7, 2026

Uh oh!

parasol-aser commented May 7, 2026

Uh oh!

oech3 commented May 7, 2026

Uh oh!

parasol-aser commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

parasol-aser commented May 1, 2026

What

Why

Measurements

Correctness

Tests

Caveats

Uh oh!

github-actions Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

parasol-aser commented May 3, 2026

Uh oh!

sylvestre commented May 7, 2026

Uh oh!

parasol-aser commented May 7, 2026

Uh oh!

oech3 commented May 7, 2026

Uh oh!

parasol-aser commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 2, 2026 •

edited

Loading