Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.
virt_to_phys((void *)kmalloc_area) >> PAGE_SHIFT,
resmem_hwaddr >> PAGE_SHIFT,
| 1GB | 4GB | 16GB | 64GB | 128GB | 188GB
A: | 9.274ms (1809.06MB/s) | 11.503ms (1458.51MB/s) | 11.333ms (1480.39MB/s) | 9.326ms (1798.97MB/s) | 213.892ms ( 78.43MB/s) | 206.476ms ( 81.25MB/s)
B: | 4.494ms (3733.25MB/s) | 4.508ms (3721.65MB/s) | 4.706ms (3565.07MB/s) | 4.513ms (3717.53MB/s) | 111.657ms ( 150.25MB/s) | 112.231ms ( 149.48MB/s)
C: | 2.933ms (5720.16MB/s) | 2.949ms (5689.12MB/s) | 2.946ms (5694.91MB/s) | 2.951ms (5685.27MB/s) | 3.001ms (5590.54MB/s) | 3.022ms (5551.69MB/s)
A: | 4.255ms (3942.94MB/s) | 4.249ms (3948.51MB/s) | 4.257ms (3941.09MB/s) | 4.298ms (3903.49MB/s) | 208.269ms ( 80.55MB/s) | 200.627ms ( 83.62MB/s)
B: | 4.37ms (3839.18MB/s) | 4.411ms (3803.5MB/s) | 4.456ms (3765.08MB/s) | 4.391ms (3820.82MB/s) | 111.562ms ( 150.38MB/s) | 112.09ms ( 149.67MB/s)
C: | 2.937ms (5712.37MB/s) | 2.949ms (5689.12MB/s) | 2.949ms (5689.12MB/s) | 2.954ms (5679.49MB/s) | 2.976ms (5637.51MB/s) | 2.985ms (5620.51MB/s)
You must Log In to send a PM.
Please Log In | Register to add a comment.
Not a member? Register Now