power of 2 question

s002wjh · November 24, 2015, 8:22pm

for mult, filter and etc etc, is it better to design with something power of 2? rather than odd number such as 100, 20 etc. for performance/efficiency reason.

s002wjh · November 24, 2015, 9:25pm

also is there 1d conv filter in cuda lib?

tera · November 24, 2015, 9:26pm

If you intend to use FFT: yes.
Otherwise: potentially. Depends a lot on what you intend to do…

CudaaduC · November 24, 2015, 10:18pm

That is a vague question.

Do you mean would it make a difference if an image was 1024 x 1024 vs 1000 x 1000 , assuming you had the choice to determine those dimensions?

In that case if you were writing your own filter kernel then it may make the code easier to write since 32 (the size of a warp) divides evenly into 1024.
It also may make it slightly faster since there is no remainder, but that difference would be very small.

Probably the best way to look at it would be to try to have the workload/array size be divisible by at least 32 or a large power of two. If you are using commercial/open source libraries then it is probably not worth worrying about.

s002wjh · November 25, 2015, 3:12pm

I look through some example code, it seem for block/thread size or address alignment etc often use power of 2 value.

also is there any example or guide on parallelize multiple nest loop in GPU?

episteme · November 25, 2015, 11:16pm

blockDim(threads/block) → 32n(multiples of warp-size), prefer to 256 or 512 (limited to 1024)
blockDim.x : blockDim.y → 32:8 better than 16:16

power of 2 question

very SMALL difference