Statistical post-hoc testing for Rust
Pure Rust • Zero required dependencies • No unsafe code
You're comparing three drug dosages, five marketing campaigns, or four manufacturing processes. A simple t-test won't work - running multiple pairwise t-tests inflates your false positive rate. With 5 groups, that's 10 comparisons, and your chance of a spurious "significant" result climbs from 5% to nearly 40%.
Post-hoc tests solve this. They control the family-wise error rate, giving you honest answers about which groups truly differ - not statistical noise.
ANOVA tells you something differs. Post-hoc tests tell you what.
Medicine & Clinical Trials - Compare patient outcomes across treatment arms. A hospital testing three pain medications needs to know not just that outcomes differ, but which drug works best and by how much, while controlling for multiple comparisons that could lead to approving an ineffective treatment.
Agriculture & Food Science - Compare crop yields across fertilizer types, or taste scores across formulations. The Tukey test was originally developed by John Tukey at Princeton for exactly these kinds of agricultural experiments - its design is purpose-built for "which of these treatments actually made a difference?"
Manufacturing & Quality Control - Compare defect rates or tolerances across production lines, shifts, or suppliers. When a factory runs five machines making the same part, Dunnett's test can flag which machines deviate from the reference line without false alarms.
Software & A/B Testing - Compare conversion rates, latency, or engagement metrics across multiple variants. With 4+ variants, pairwise t-tests give misleading results; Tukey HSD or Games-Howell gives you defensible answers.
Education & Social Science - Compare test scores across teaching methods, survey responses across demographics, or behavioral outcomes across intervention groups. These fields routinely analyze 3-10 groups and need post-hoc tests that reviewers and journals accept.
Environmental Science - Compare pollution levels across sites, species counts across habitats, or water quality across treatment methods. Ragged data (unequal sample sizes) is the norm here, which is why Tukey-Kramer and Games-Howell matter.
Rust has been ranked the most admired programming language in the Stack Overflow Developer Survey for nine consecutive years (2016–2024). Developer adoption is accelerating - Rust is now used in the Linux kernel, the Windows kernel, the Android platform, and major infrastructure at AWS, Microsoft, Google, and Meta. It is no longer a niche systems language; it is becoming the default choice for any new software where correctness, performance, and long-term maintainability matter.
The statistical computing ecosystem hasn't kept up. tukey_test is part of closing that gap.
R, Python, and Mathematica are great for interactive analysis in a notebook. But when you're building something that runs statistical tests in production - a clinical trial pipeline, a manufacturing quality system, a real-time A/B testing platform - those tools create serious problems:
- You have to shell out to another language mid-pipeline, adding subprocess overhead, serialization, and a fragile runtime dependency
- Python and R's numerical output can vary by OS, version, and BLAS library - meaning results on your laptop may not match production
- Deploying a Python + scipy + numpy stack (or an R environment) on edge hardware, in Docker, or in WebAssembly is genuinely painful
This is the gap tukey_test fills. Post-hoc statistical tests have essentially zero native Rust coverage. If you needed Tukey HSD in a Rust service today, your options were: reimplement it yourself, shell out to R, or change languages. None of those are acceptable in a serious production codebase.
1. Correctness you can trust Rust's type system and borrow checker eliminate whole categories of bugs - buffer overruns, data races, silent memory corruption - that have caused real errors in scientific software. Reproducibility is a well-documented crisis in research. A statistically correct implementation that cannot silently corrupt memory is a different class of software than a Python script that happens to produce the right answer most of the time.
2. Deterministic results everywhere Rust produces bit-for-bit identical output across platforms and operating systems. For auditable systems - FDA regulatory submissions, financial reporting, clinical trial analysis - that determinism is not a nice-to-have, it's a requirement. R and Python cannot make that guarantee.
3. Zero-overhead integration
A Rust service can call tukey_hsd() directly in the same process, with no FFI, no subprocess, no round-trip serialization. For high-frequency data processing or embedded applications - real-time sensor analysis, manufacturing QC on the line - that's the difference between feasible and not.
4. Simple, self-contained deployment A Rust binary with zero required dependencies ships as a single executable. No runtime to install, no package manager to invoke, no version conflicts. Compare that to deploying a full R or Python scientific stack on edge hardware or in a serverless function.
The intended user is not a statistician choosing between tools for a one-off analysis - use R or Python for that. This crate is for the Rust engineer who already knows what statistical test they need and is now building the system that runs it reliably, repeatedly, and at scale. That person has had no good option until now.
| Test | When to Use | Function |
|---|---|---|
| One-way ANOVA | Check if any group means differ overall | one_way_anova() |
| Tukey HSD | All pairwise comparisons (equal variances) | tukey_hsd() |
| Games-Howell | All pairwise comparisons (unequal variances) | games_howell() |
| Dunnett's test | Compare treatments against a single control | dunnett() |
| Levene's test | Test equality of variances (Brown-Forsythe variant) | levene_test() |
| q critical value | Studentized range distribution lookup | q_critical() |
| Dunnett critical value | Dunnett distribution lookup | dunnett_critical() |
| Studentized range CDF | Exact p-values for post-hoc tests | ptukey_cdf() |
All tests support unequal group sizes. Tukey HSD applies the Tukey-Kramer adjustment automatically.
All pairwise comparison results include an exact p-value computed via ptukey_cdf. ANOVA results include η² (eta-squared) and ω² (omega-squared) effect sizes. q_critical supports any α and any number of groups k (not limited to the table range).
Add to your Cargo.toml:
[dependencies]
tukey_test = "0.2"The most common workflow - test for an overall effect, then find which pairs differ:
use tukey_test::{one_way_anova, tukey_hsd};
let data = vec![
vec![6.0, 8.0, 4.0, 5.0, 3.0, 4.0], // Group A
vec![8.0, 12.0, 9.0, 11.0, 6.0, 8.0], // Group B
vec![13.0, 9.0, 11.0, 8.0, 12.0, 14.0], // Group C
];
// Step 1: Is there an overall difference?
let anova = one_way_anova(&data).unwrap();
println!("{anova}");
// F = 13.18, p = 0.0005
// Step 2: Which pairs differ?
let result = tukey_hsd(&data, 0.05).unwrap();
for pair in result.significant_pairs() {
println!("Groups {} and {} differ (q = {:.4})",
pair.group_i, pair.group_j, pair.q_statistic);
}use tukey_test::games_howell;
let data = vec![
vec![4.0, 5.0, 3.0, 4.0, 6.0],
vec![20.0, 30.0, 25.0, 35.0, 28.0], // high variance
vec![5.0, 7.0, 6.0, 4.0, 5.0],
];
let result = games_howell(&data, 0.05).unwrap();
println!("{result}");use tukey_test::dunnett;
let data = vec![
vec![10.0, 12.0, 11.0, 9.0, 10.0], // control (group 0)
vec![15.0, 17.0, 14.0, 16.0, 15.0], // treatment A
vec![11.0, 13.0, 10.0, 12.0, 11.0], // treatment B
];
let result = dunnett(&data, 0, 0.05).unwrap();
for t in result.significant_treatments() {
println!("Treatment {} differs from control (t = {:.4})",
t.treatment, t.t_statistic);
}use tukey_test::levene_test;
let data = vec![
vec![5.0, 6.0, 5.5, 5.2, 5.8], // group A (low variance)
vec![1.0, 9.0, 3.0, 8.0, 5.0], // group B (high variance)
];
let result = levene_test(&data, 0.05).unwrap();
if result.significant {
println!("Unequal variances detected - use games_howell()");
} else {
println!("Variances are homogeneous - tukey_hsd() is appropriate");
}Every pairwise comparison result includes an exact p-value computed via the studentized range CDF:
use tukey_test::tukey_hsd;
let data = vec![
vec![6.0, 8.0, 4.0, 5.0, 3.0, 4.0],
vec![8.0, 12.0, 9.0, 11.0, 6.0, 8.0],
vec![13.0, 9.0, 11.0, 8.0, 12.0, 14.0],
];
let result = tukey_hsd(&data, 0.05).unwrap();
for pair in &result.comparisons {
println!("Groups ({}, {}): p = {:.4}", pair.group_i, pair.group_j, pair.p_value);
}ANOVA results include η² (eta-squared) and ω² (omega-squared) for practical effect size:
use tukey_test::one_way_anova;
let data = vec![
vec![6.0, 8.0, 4.0, 5.0, 3.0, 4.0],
vec![8.0, 12.0, 9.0, 11.0, 6.0, 8.0],
vec![13.0, 9.0, 11.0, 8.0, 12.0, 14.0],
];
let result = one_way_anova(&data).unwrap();
println!("η² = {:.3}, ω² = {:.3}", result.eta_squared, result.omega_squared);
// η² = 0.637, ω² = 0.589All test functions accept any type implementing AsRef<[f64]> for groups - no need to convert into Vec<Vec<f64>>:
use tukey_test::one_way_anova;
// Slices
let data: &[&[f64]] = &[&[1.0, 2.0, 3.0], &[4.0, 5.0, 6.0]];
one_way_anova(data).unwrap();
// Fixed-size arrays
let data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]];
one_way_anova(&data).unwrap();Read data from CSV files in wide format (each column is a group). Headers are auto-detected and skipped. Columns can have unequal lengths. Values of NaN, Infinity, and -Infinity are rejected with a ParseError.
control,treatment_a,treatment_b
10,15,11
12,17,13
11,14,10
9,16,12
10,15,11
use tukey_test::{parse_csv_file, tukey_hsd};
let groups = parse_csv_file("experiment.csv").unwrap();
let result = tukey_hsd(&groups, 0.05).unwrap();
println!("{result}");You can also parse from any std::io::Read source:
use tukey_test::parse_csv;
let csv_data = "a,b\n1,4\n2,5\n3,6\n";
let groups = parse_csv(csv_data.as_bytes()).unwrap();Serialize any result to JSON, CSV, or any serde-supported format:
[dependencies]
tukey_test = { version = "0.2", features = ["serde"] }The crate also ships as a command-line tool.
git clone https://github.com/TechieTeee/Rust_Tukey_Test.git
cd Rust_Tukey_Test
cargo build --release# Inline - separate groups with --
tukey_test hsd 0.05 6 8 4 5 3 4 -- 8 12 9 11 6 8 -- 13 9 11 8 12 14
# From a CSV file
tukey_test hsd 0.05 --file experiment.csv
# From stdin (pipe-friendly)
cat data.csv | tukey_test hsd 0.05 --file -# One-way ANOVA
tukey_test anova 6 8 4 5 3 4 -- 8 12 9 11 6 8 -- 13 9 11 8 12 14
tukey_test anova --file data.csv
# Tukey HSD (alpha = 0.05)
tukey_test hsd 0.05 6 8 4 5 3 4 -- 8 12 9 11 6 8 -- 13 9 11 8 12 14
tukey_test hsd 0.05 --file data.csv
# Games-Howell (alpha = 0.05)
tukey_test games-howell 0.05 4 5 3 4 6 -- 20 30 25 35 28 -- 5 7 6 4 5
tukey_test games-howell 0.05 --file data.csv
# Dunnett (alpha = 0.05, control = group 0)
tukey_test dunnett 0.05 0 10 12 11 9 10 -- 15 17 14 16 15 -- 11 13 10 12 11
tukey_test dunnett 0.05 0 --file data.csv
# Critical value lookup
tukey_test q 3 15 0.05One-Way ANOVA
Source SS df MS F p-value
--------------------------------------------------------------------
Between 117.4444 2 58.7222 13.1796 0.000497
Within 66.8333 15 4.4556
Total 184.2778 17
Tukey HSD Test Results (alpha = 0.05, df = 15, MSE = 4.4556, q_critical = 3.6700)
Comparison Mean Diff q-stat Significant CI
------------------------------------------------------------------------
( 0, 1) 4.0000 4.6418 Yes [-7.1626, -0.8374]
( 0, 2) 6.1667 7.1561 Yes [-9.3292, -3.0041]
( 1, 2) 2.1667 2.5143 No [-5.3292, 0.9959]
Dunnett's Test Results (alpha = 0.05, control = group 0, df = 12, d_critical = 2.3800)
Treatment Mean Diff t-stat Significant CI
--------------------------------------------------------------------------
Group 1 vs 0 5.0000 6.9338 Yes [3.2838, 6.7162]
Group 2 vs 0 1.0000 1.3868 No [-0.7162, 2.7162]
Got multiple groups to compare?
|
|-- Start with one_way_anova()
| p < 0.05? --> significant overall difference
| eta_squared / omega_squared --> practical effect size
|
|-- Want all pairwise comparisons?
| |
| |-- Check variances first: levene_test()
| | |-- Not significant (homogeneous)? --> tukey_hsd()
| | +-- Significant (unequal)? --> games_howell()
| |
| +-- All pairwise results include exact p-values
|
+-- Comparing treatments to a control?
+-- dunnett()
This crate aims to be the go-to resource for post-hoc and related statistical tests in Rust. Future plans include:
- Scheffe's test - the most conservative post-hoc test, for complex contrasts
- Bonferroni / Holm corrections - general-purpose p-value adjustment for multiple comparisons
- Welch's ANOVA - one-way ANOVA without equal variance assumption
- Cohen's d - pairwise effect size for individual comparisons
- Dunnett's test for any k/df - remove table-based limits via numerical multivariate-t integration (currently requires df ≥ 5 and k ≤ 9)
- Games-Howell non-integer df - fix Welch-Satterthwaite df truncation for more accurate p-values at small sample sizes
ptukey_cdfextended validation - comprehensive accuracy benchmarks against R across the full (k, df, q) grid, especially for k > 20 and df < 5no_std/f32support - for embedded and WASM scientific computing targets
These items are targeted for 0.3.0.
Contributions, feature requests, and bug reports are welcome!
Rust 1.56+ (edition 2021). Install from rust-lang.org.