Skip to content

Conversation

@jayshah1819
Copy link
Contributor

Features

Two binning strategies:

  • Even bins: Arithmetic binning for uniform ranges
  • Custom bins: Binary search for arbitrary bin edges

Grid-stride loop pattern for processing large inputs
Shared memory optimization for ≤256 bins
Falls back to global atomics for larger bin counts

kernels

Init kernel: zeros output buffer
Sweep kernel: 3-phase execution (init shared mem → process data → merge results)

Benchmarking result (even, custom- trial:20, inputLength: 2^27)

Screenshot 2025-11-18 at 8 21 27 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant