A workload catalogue + harness for measuring cljw against other runtimes.
Each benchmarks/NN_<name>/ directory holds a meta.yaml (name / category /
expected_output / description) and one source file per language
(bench.clj, bench.c, bench.zig, Bench.java, bench.py, bench.rb,
bench.js, bench.go). expected_output doubles as a correctness oracle —
a runtime that prints the wrong value is shown as a SKIP, never timed.
The cross-language table below is generated from the measurement YAML by
gen_cross_table.py, never hand-maintained — a hand-curated table drifts from the data (the lesson from cw's predecessor).
# cljw-only: per-bench millisecond table (builds ReleaseSafe, uses hyperfine)
bash bench/run_bench.sh # all native workloads
bash bench/run_bench.sh --bench=sieve # one workload
bash bench/run_bench.sh --no-wasm # skip the -Dwasm FFI workloads
# Cross-language comparison → measurement YAML
bash bench/compare_langs.sh --yaml=bench/cross-lang-latest.yaml
# Regenerate the Markdown table from the YAML
yq -o=json bench/cross-lang-latest.yaml | python3 bench/gen_cross_table.pyThe cross-language harness compiles each language on the fly and auto-skips any
toolchain that is absent (command -v guard); run it from inside nix develop
for the pinned, reproducible toolchains. cljw is built ReleaseSafe (the
shipped build); timing is hyperfine, reported as cold-start wall-clock
(process launch → exit, startup included).
Cold-start wall-clock (process launch → exit), µs, lower is better. Columns: ClojureWasm, then Python / Ruby / Node.js / Babashka, then Java / Go / C. Only cold-start is published: it is the metric that compares uniformly across languages. A startup-subtracted "compute" table is intentionally omitted — for the fast languages the per-run compute sits below process-spawn noise (~3 ms ± 10%), so subtracting startup would report noise, not signal.
Conditions: MacBook Pro (Mac16,8), Apple M4 Pro, 12-core (8P+4E), 48 GB RAM, macOS 26.5 (25F71), hyperfine 5 warmup + 10 runs, 2026-06-24.
Cold-start = process launch → exit (startup included). Only cold-start is shown: it is the metric that compares uniformly across languages. A startup-subtracted compute number is omitted because, for the fast languages, compute sits below process-spawn noise.
| Benchmark | ClojureWasm | Python | Ruby | Node.js | Babashka | Java | Go | C |
|---|---|---|---|---|---|---|---|---|
| fib_recursive | 23503 | 25615 | 36206 | 48851 | 26225 | 24986 | 2555 | 2140 |
| fib_loop | 3546 | 18711 | 34380 | 49948 | 13843 | 26226 | 3139 | 3228 |
| tak | 11434 | 19054 | 34719 | 47189 | 15794 | 26181 | 2849 | 3101 |
| arith_loop | 38003 | 59859 | 55943 | 51623 | 51997 | 28567 | 4161 | 827 |
| map_filter_reduce | 11702 | 18656 | 33838 | 48024 | 13248 | 25797 | 3484 | 1049 |
| vector_ops | 6989 | 15602 | 32880 | 47261 | 12647 | 26178 | 5155 | 2374 |
| map_ops | 4408 | 14300 | 32015 | 47221 | 12849 | 26636 | 5041 | 825 |
| list_build | 4866 | 23680 | 31758 | 47516 | 11634 | 23584 | 4984 | 2959 |
| sieve | 20054 | 18097 | 34168 | 50351 | 16635 | 29745 | 2434 | 264 |
| nqueens | 20451 | 21384 | 39818 | 49237 | 23222 | 24990 | 3870 | 1629 |
| atom_swap | 5324 | 16091 | 32937 | 48559 | 13307 | 26408 | 4283 | 1423 |
| gc_stress | 22734 | 33736 | 42374 | 51323 | 33861 | 38145 | 9968 | 3341 |
| lazy_chain | 8961 | 19591 | 33628 | 50266 | 14100 | 23685 | 2169 | 2203 |
| transduce | 8849 | 18893 | 33743 | 50917 | 13956 | 28296 | 2544 | 13 |
| keyword_lookup | 12776 | 22675 | 35916 | 48568 | 19623 | 29946 | 2021 | 1576 |
| protocol_dispatch | 5898 | 17585 | 33875 | 48032 | — | 28233 | 4878 | 1716 |
| nested_update | 9847 | 16844 | 34572 | 51146 | 13978 | 27579 | 3986 | 1244 |
| string_ops | 23163 | 31598 | 38890 | 50590 | 21241 | 32016 | 3667 | 3395 |
| multimethod_dispatch | 6056 | 17790 | 33182 | 49329 | 13020 | 27382 | 5585 | 2981 |
| real_workload | 12541 | 19697 | 37310 | 49283 | 15693 | 32933 | 1166 | 1373 |
| gc_alloc_rate | 38032 | — | — | 52104 | 35213 | — | — | — |
| json_parse | 35372 | 36176 | 46773 | 56555 | — | — | — | — |
| gc_large_heap | 34647 | — | — | 54982 | 32237 | — | — | — |
| bigint_factorial | 21087 | 20824 | 45240 | 52074 | 16670 | 34572 | 4403 | — |
| ratio_sum | 28217 | — | — | — | 37202 | — | — | — |
| stm_refs | 37714 | — | — | — | 44392 | — | — | — |
| regex_count | 20801 | 24137 | 44836 | 50880 | 19061 | 35606 | 5833 | 16871 |
| sort | 7275 | 20719 | 34590 | 49259 | 13388 | 30158 | 6502 | 1900 |
| destructure | 51360 | — | — | — | 45291 | — | — | — |
| edn_roundtrip | 28368 | — | — | — | 105447 | — | — | — |
- A few ClojureWasm facts (no comparison intended): cold-start folds in
loading the embedded AOT-compiled
clojure.core(ADR-0056), so the table is the real time-to-first-eval — forcljw(a tree-walking + bytecode-VM interpreter) that is mostly genuine work; for the compiled languages a small-workload run is mostly process spawn.cljwis builtReleaseSafe(optimised, all safety checks on). - Reproducibility. Toolchains are pinned in
flake.nix; run from insidenix developfor the full, reproducible column set. The harness auto-skips a language whose toolchain is absent.