GPUx v0.1.5 — Community Testing Release

First release for community testing. ASIC-resistant, latency-bound proof-of-work for GPUs; proposed replacement for Cuckaroo29 (C29) in Tari (XTM).

What's in this release

Frozen algorithm spec (ALGORITHM_SPEC.md)
C reference implementation (spec/) — bit-exact authority
CUDA mining kernel + GPU Argon2id DAG generator (cuda/)
Cross-platform benchmark harness (bench/)
Pre-built Windows binary (multi-arch fat binary, sm_75 → sm_120)

Pre-built binaries

Platform	Binary	Coverage
Windows x64, CUDA 13.2+	`gpux_miner-v0.1.5-windows-x64-cuda13-multiarch.exe`	Turing (RTX 20-series, GTX 16-series, T4) → Blackwell (RTX 50-series), all CMP cards

The Windows binary is a CUDA fat binary containing compiled code for sm_75, sm_80, sm_86, sm_89, sm_90, and sm_120, plus PTX for forward compatibility. One file runs on any supported GPU.

Linux: build from source with cd cuda && make after installing CUDA Toolkit 12.6+ or 13+. A Linux x64 binary will ship in v0.1.6 once CI is set up.

Verified baseline

NVIDIA RTX 5090 (sm_120) @ stock: 1.46 MH/s at 410 W (~3.6 kH/W)
DAG generation: 32.5 s for 2 GiB (one-time per epoch)

Known limitations

v0.1.5 is CUDA-only. AMD / Intel support via OpenCL planned for v0.2.
No light-verifier (Merkle DAG witness) yet — full nodes need the 2 GiB DAG. Out of scope for this round.
Tari multi-algo integration not wired — that comes after community testing settles the algorithm.

Tester quick-start

# Windows
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
.\bench\run_bench.ps1

# Linux
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
./bench/run_bench.sh

The harness:

Builds gpux_miner (or uses the pre-built .exe if present)
Runs verify — confirms your GPU produces bit-exact hashes vs the C reference
Runs bench 60 — 60 seconds of steady-state hashing
Writes a JSON to bench/results/

Submit your JSON via PR or issue. We're collecting:

Hashrate vs SM count vs VRAM vs power
Cross-arch determinism evidence
Any verify failures (bit-exact mismatches between GPU and reference)

See COMMUNITY_TESTING.md for the full protocol.

What changed since v0.1

DAG generation moved from ChaCha20 to Argon2id (RFC 9106). Prevents the "compute-don't-store" ASIC attack — recompute is now ~165× more expensive than reading from DRAM, forcing any competitive ASIC to ship with HBM/GDDR.
GPU Argon2id port: epoch transition went from ~9.5 minutes (CPU) to 32.5 seconds (GPU).
Verified primitives: BLAKE2b against RFC 7693, ChaCha20 against RFC 8439, AES round against FIPS-197 Appendix B, Argon2id against RFC 9106 / argon2 CLI vectors. All KAT pass.

License

MIT. Bundled crypto primitives (BLAKE2b, ChaCha20, AES round, Argon2id) are public-domain or CC0+Apache-2.0. Fork it, audit it, break it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

GPUx v0.1.5 — Community Testing Release

What's in this release

Pre-built binaries

Verified baseline

Known limitations

Tester quick-start

What changed since v0.1

License

Uh oh!

Releases: JustAResearcher/Latency-Based-GPU-Algorithm

v0.1.5

GPUx v0.1.5 — Community Testing Release

What's in this release

Pre-built binaries

Verified baseline

Known limitations

Tester quick-start

What changed since v0.1

License

Uh oh!