-
Notifications
You must be signed in to change notification settings - Fork 218
Use usize for indices and dimensions
#343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Specifically: - `thread_idx*` - `block_idx*` - `block_dim*` - `grid_dim*` - `index*` This removes lots of `as u32`/`as usize` casts.
|
cc @FractalFir |
FractalFir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some issues with the use of as casts(I fear the will truncate things unexpectedly), plus some general things I noticed whilst going over the PR.
| } | ||
| impl From<u32> for GridSize { | ||
| fn from(x: u32) -> GridSize { | ||
| impl From<usize> for GridSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider keeping the old impl, if that does not cause any issues(to allow casting from u32 too).
IDK if this would be of any worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but none of the existing example code requires it so it doesn't seem useful. We can add it back if necessary.
| pub fn thread_idx_x() -> usize { | ||
| // The range is derived from the `block_idx_x` range. | ||
| in_range!(core::arch::nvptx::_thread_idx_x, 0..1024) | ||
| in_range!(core::arch::nvptx::_thread_idx_x, 0..1024) as usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to current change, but I find the range here suspicious:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-hierarchy
There is a limit to the number of threads per block, since all threads of a block are expected to reside on the same streaming multiprocessor core and must share the limited memory resources of that core. On current GPUs, a thread block may contain up to 1024 threads.
However, a kernel can be executed by multiple equally-shaped thread blocks, so that the total number of threads is equal to the number of threads per block times the number of blocks.
The "current GPUs" suggests this is not an API promise, but mearly the current highest value of this. Can this raise in the future? What happens then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See table 30 in this section for a more specific description of the limits here. You are right that it's not a guarantee, but the relevant numbers haven't changed from Compute Capability 5.0 all the way to 12.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this I realize that some of these comments have an error in them. I'll fix that.
| let idx = thread::index_1d() as usize; | ||
| let idx = thread::index_1d(); | ||
|
|
||
| if idx == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have something like thread::first for this "is thread idx 0" check? Unrelated to the core changes, but would be nice to use that instead of using only one of the indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside the scope of this PR, however.
| #[allow(improper_ctypes_definitions)] | ||
| pub unsafe fn add(a: &[T], b: &[T], c: *mut T) { | ||
| let i = thread::index_1d() as usize; | ||
| let i = thread::index_1d(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, unrelated to the core changes, but something I noticed while reviewing the code: Should we encourage people to use index_1d like this?
What happens if we have launch dimensions with values different from 1 on the Y / Z axis? (I belive this would be a data race then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, and the comment on index talks about this here. But this is also outside the scope of this PR.
Instead of `u32`, because using `usize` for dimensions and indices is more natural in Rust and avoids lots of casts.
These types are very similar to the samed-named types from `cust`, which were changed in the previous commit. Currently these types are in a module that is commented out and marked as "WIP", but it makes sense to change them like the `cust` types in case they become used in the future. (Note: I temporarily uncommented the code to make sure the changes compile.)
The indices are derived from the dimensions.
0e90a08 to
191ca97
Compare
|
I added checking to the truncations, and added a new commit that fixes the incorrect comments. |
CUDA uses 32-bit unsigned integers for indices and dimensions, and rust-cuda currently follows suit. This is bad because Rust uses
usizefor indices and dimensions, which requires lots ofas u32/as usizecasts.This PR change the
u32indices/dimensions tousize. This makes rust-cuda nicer to use. E.g. withinexamples/the number ofas usizecasts drops from 14 to 0, and the number ofas u32casts drops from 10 to 2.