GPUx — ASIC-resistant PoW for GPUs

Status: v0.1.5 community testing Target: Replacement for Cuckaroo29 (C29) in Tari (XTM) Goal: GPU-native, ASIC-resistant proof-of-work; low power; cheap verifier.

What this is

GPUx is a candidate proof-of-work algorithm designed to make GPU mining durable against ASIC takeover. It combines random per-epoch programs, a 2 GiB random-access DAG, and a per-thread scratchpad to force any would-be ASIC into looking like a GPU — at which point the ASIC has no cost advantage.

Three artifacts in this repo:

Algorithm spec (ALGORITHM_SPEC.md) — formal definition.
Reference C implementation (spec/) — the authoritative semantics.
CUDA implementation + bench harness (cuda/, bench/) — what community testers run on their GPUs.

If you are a community tester, jump to COMMUNITY_TESTING.md.

If you are reviewing the algorithm, start with ALGORITHM_SPEC.md and then docs/DESIGN_RATIONALE.md.

Quick numbers (RTX 5090, unoptimized v0.1 kernel)

Metric	Value
Hashrate	~1.25 MH/s
DAG generation	2 GiB in ~30 ms (~65 GB/s)
Per-share verify	~0.5 ms (warm DAG)
GPU vs reference	bit-identical (5/5 KAT nonces)

These are baseline numbers from a reference port. Optimized kernels (warp-cooperative DAG access, shared-memory scratchpad, instruction reordering) are expected to multiply throughput 2–5× without changing consensus.

Why GPUx is hard for ASICs (one-screen summary)

ASICs win when the algorithm is small, homogeneous, and predictable. GPUx attacks each premise:

Property	GPUx mechanism
Predictable kernel	Random program regenerated every 1024 blocks
Small kernel	256 ops × 64 iters = 16 384 ops/nonce, 12 distinct opcodes, 32 64-bit lanes
Cheap memory	2 GiB DAG with random dependent access (forces GDDR/HBM)
No cache	16 KiB per-thread scratchpad with R-M-W (forces L1-equivalent)
One datapath	Mix of 64-bit int ALU, MULHI, AES round, IEEE-754 FP32 FMA
Throughput parallel	Latency-bound dependent chains limit pipelining

Long-form analysis with comparisons to Ethash, ProgPoW, RandomX, Cuckaroo, and X16R is in docs/DESIGN_RATIONALE.md.

Repo layout

gpux/
├── ALGORITHM_SPEC.md          formal algorithm spec
├── COMMUNITY_TESTING.md       how to run tests and submit results
├── README.md                  this file
├── Makefile                   builds reference + tests (Linux/WSL/macOS)
├── spec/                      reference C implementation
│   ├── gpux.h / gpux.c        algorithm reference (the source of truth)
│   ├── blake2b.c+h            embedded BLAKE2b reference
│   ├── chacha20.c+h           embedded ChaCha20 reference
│   ├── aes_round.c+h          embedded AES single-round reference
│   └── test_vectors.h         frozen KAT (regenerate with `make gen-kat`)
├── tests/
│   ├── smoke.c                primitive correctness (BLAKE2b, ChaCha20, AES, KAT generators)
│   ├── kat.c                  full hash KAT (allocates 2 GiB)
│   └── gen_kat.c              regenerate test_vectors.h
├── cuda/                      CUDA implementation
│   ├── gpux_kernel.cu         the mining kernel
│   ├── gpux_device.cuh        device-side BLAKE2b/ChaCha20/AES
│   ├── gpux_miner.cu          host driver: verify, bench, info
│   ├── Makefile               Linux/WSL build
│   └── build.bat              Windows build (vcvars + nvcc)
├── bench/                     community testing
│   ├── run_bench.ps1          Windows harness
│   ├── run_bench.sh           Linux harness
│   └── results/               per-GPU JSON results (created on first run)
└── docs/
    └── DESIGN_RATIONALE.md    why each design choice; ASIC-resistance argument

Building

Linux / WSL / macOS (reference + tests)

make smoke   # primitive tests, no DAG
make kat     # full KAT (allocates 2 GiB)

Linux / WSL / macOS (CUDA)

cd cuda && make
./gpux_miner verify
./gpux_miner bench 30

Windows (CUDA)

Requires Visual Studio 2022 BuildTools + CUDA 13.x.

cd cuda
.\build.bat
.\gpux_miner.exe verify
.\gpux_miner.exe bench 30

Or use the testing wrapper:

.\bench\run_bench.ps1 -Seconds 60

Tari integration (proposed)

Tari's existing block header is hashed with BLAKE2b-256 to produce a 32-byte digest. To use GPUx as a PoW algorithm:

header_digest = BLAKE2b-256(serialized_block_header_excluding_nonce)
block_hash    = GPUx(header_digest, nonce)

Difficulty target and Tari's multi-algo selection layer integrate at the consensus boundary. See ALGORITHM_SPEC.md §11.

v0.1 status

License

MIT — see LICENSE. Bundled reference primitives (BLAKE2b, ChaCha20, AES round, Argon2id) are public-domain or CC0/Apache-2.0 and remain so under MIT. The intent is full open-source auditability — fork it, break it, propose changes via PR, run your own bench results and submit them as JSON files in bench/results/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPUx — ASIC-resistant PoW for GPUs

What this is

Quick numbers (RTX 5090, unoptimized v0.1 kernel)

Why GPUx is hard for ASICs (one-screen summary)

Repo layout

Building

Linux / WSL / macOS (reference + tests)

Linux / WSL / macOS (CUDA)

Windows (CUDA)

Tari integration (proposed)

v0.1 status

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
bench		bench
cuda		cuda
docs		docs
spec		spec
tests		tests
.gitignore		.gitignore
ALGORITHM_SPEC.md		ALGORITHM_SPEC.md
COMMUNITY_TESTING.md		COMMUNITY_TESTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE_NOTES_v0.1.5.md		RELEASE_NOTES_v0.1.5.md

Folders and files

Latest commit

History

Repository files navigation

GPUx — ASIC-resistant PoW for GPUs

What this is

Quick numbers (RTX 5090, unoptimized v0.1 kernel)

Why GPUx is hard for ASICs (one-screen summary)

Repo layout

Building

Linux / WSL / macOS (reference + tests)

Linux / WSL / macOS (CUDA)

Windows (CUDA)

Tari integration (proposed)

v0.1 status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages