Releases: JustAResearcher/Latency-Based-GPU-Algorithm
Releases · JustAResearcher/Latency-Based-GPU-Algorithm
v0.1.5
GPUx v0.1.5 — Community Testing Release
First release for community testing. ASIC-resistant, latency-bound proof-of-work for GPUs; proposed replacement for Cuckaroo29 (C29) in Tari (XTM).
What's in this release
- Frozen algorithm spec (
ALGORITHM_SPEC.md) - C reference implementation (
spec/) — bit-exact authority - CUDA mining kernel + GPU Argon2id DAG generator (
cuda/) - Cross-platform benchmark harness (
bench/) - Pre-built Windows binary (multi-arch fat binary, sm_75 → sm_120)
Pre-built binaries
| Platform | Binary | Coverage |
|---|---|---|
| Windows x64, CUDA 13.2+ | gpux_miner-v0.1.5-windows-x64-cuda13-multiarch.exe |
Turing (RTX 20-series, GTX 16-series, T4) → Blackwell (RTX 50-series), all CMP cards |
The Windows binary is a CUDA fat binary containing compiled code for sm_75, sm_80, sm_86, sm_89, sm_90, and sm_120, plus PTX for forward compatibility. One file runs on any supported GPU.
Linux: build from source with cd cuda && make after installing CUDA Toolkit 12.6+ or 13+. A Linux x64 binary will ship in v0.1.6 once CI is set up.
Verified baseline
- NVIDIA RTX 5090 (sm_120) @ stock: 1.46 MH/s at 410 W (~3.6 kH/W)
- DAG generation: 32.5 s for 2 GiB (one-time per epoch)
Known limitations
- v0.1.5 is CUDA-only. AMD / Intel support via OpenCL planned for v0.2.
- No light-verifier (Merkle DAG witness) yet — full nodes need the 2 GiB DAG. Out of scope for this round.
- Tari multi-algo integration not wired — that comes after community testing settles the algorithm.
Tester quick-start
# Windows
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
.\bench\run_bench.ps1# Linux
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
./bench/run_bench.shThe harness:
- Builds
gpux_miner(or uses the pre-built.exeif present) - Runs
verify— confirms your GPU produces bit-exact hashes vs the C reference - Runs
bench 60— 60 seconds of steady-state hashing - Writes a JSON to
bench/results/
Submit your JSON via PR or issue. We're collecting:
- Hashrate vs SM count vs VRAM vs power
- Cross-arch determinism evidence
- Any verify failures (bit-exact mismatches between GPU and reference)
See COMMUNITY_TESTING.md for the full protocol.
What changed since v0.1
- DAG generation moved from ChaCha20 to Argon2id (RFC 9106). Prevents the "compute-don't-store" ASIC attack — recompute is now ~165× more expensive than reading from DRAM, forcing any competitive ASIC to ship with HBM/GDDR.
- GPU Argon2id port: epoch transition went from ~9.5 minutes (CPU) to 32.5 seconds (GPU).
- Verified primitives: BLAKE2b against RFC 7693, ChaCha20 against RFC 8439, AES round against FIPS-197 Appendix B, Argon2id against RFC 9106 / argon2 CLI vectors. All KAT pass.
License
MIT. Bundled crypto primitives (BLAKE2b, ChaCha20, AES round, Argon2id) are public-domain or CC0+Apache-2.0. Fork it, audit it, break it.