Skip to content

Releases: JustAResearcher/Latency-Based-GPU-Algorithm

v0.1.5

10 May 02:11

Choose a tag to compare

GPUx v0.1.5 — Community Testing Release

First release for community testing. ASIC-resistant, latency-bound proof-of-work for GPUs; proposed replacement for Cuckaroo29 (C29) in Tari (XTM).

What's in this release

  • Frozen algorithm spec (ALGORITHM_SPEC.md)
  • C reference implementation (spec/) — bit-exact authority
  • CUDA mining kernel + GPU Argon2id DAG generator (cuda/)
  • Cross-platform benchmark harness (bench/)
  • Pre-built Windows binary (multi-arch fat binary, sm_75 → sm_120)

Pre-built binaries

Platform Binary Coverage
Windows x64, CUDA 13.2+ gpux_miner-v0.1.5-windows-x64-cuda13-multiarch.exe Turing (RTX 20-series, GTX 16-series, T4) → Blackwell (RTX 50-series), all CMP cards

The Windows binary is a CUDA fat binary containing compiled code for sm_75, sm_80, sm_86, sm_89, sm_90, and sm_120, plus PTX for forward compatibility. One file runs on any supported GPU.

Linux: build from source with cd cuda && make after installing CUDA Toolkit 12.6+ or 13+. A Linux x64 binary will ship in v0.1.6 once CI is set up.

Verified baseline

  • NVIDIA RTX 5090 (sm_120) @ stock: 1.46 MH/s at 410 W (~3.6 kH/W)
  • DAG generation: 32.5 s for 2 GiB (one-time per epoch)

Known limitations

  • v0.1.5 is CUDA-only. AMD / Intel support via OpenCL planned for v0.2.
  • No light-verifier (Merkle DAG witness) yet — full nodes need the 2 GiB DAG. Out of scope for this round.
  • Tari multi-algo integration not wired — that comes after community testing settles the algorithm.

Tester quick-start

# Windows
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
.\bench\run_bench.ps1
# Linux
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
./bench/run_bench.sh

The harness:

  1. Builds gpux_miner (or uses the pre-built .exe if present)
  2. Runs verify — confirms your GPU produces bit-exact hashes vs the C reference
  3. Runs bench 60 — 60 seconds of steady-state hashing
  4. Writes a JSON to bench/results/

Submit your JSON via PR or issue. We're collecting:

  • Hashrate vs SM count vs VRAM vs power
  • Cross-arch determinism evidence
  • Any verify failures (bit-exact mismatches between GPU and reference)

See COMMUNITY_TESTING.md for the full protocol.

What changed since v0.1

  • DAG generation moved from ChaCha20 to Argon2id (RFC 9106). Prevents the "compute-don't-store" ASIC attack — recompute is now ~165× more expensive than reading from DRAM, forcing any competitive ASIC to ship with HBM/GDDR.
  • GPU Argon2id port: epoch transition went from ~9.5 minutes (CPU) to 32.5 seconds (GPU).
  • Verified primitives: BLAKE2b against RFC 7693, ChaCha20 against RFC 8439, AES round against FIPS-197 Appendix B, Argon2id against RFC 9106 / argon2 CLI vectors. All KAT pass.

License

MIT. Bundled crypto primitives (BLAKE2b, ChaCha20, AES round, Argon2id) are public-domain or CC0+Apache-2.0. Fork it, audit it, break it.