I’m getting back into CUDA and I’ve made a simple testbed for a simple summing kernel (just sums two buffers into a third). The kernel seems to be retuning an “invalid argument” error, even though the arguments, upon visual inspection, seem to match up. From what I understand the grid and block sizes I’ve passed are valid. After 1-2 hours of troubleshooting I’ve decided to just ask for help as its probably something fundamental that I’m overlooking.
You can pull all my code from here:
compile with
$ make clean program
and launch with
$ ./program
You should see something like this:
Device Number: 0
Device name: GeForce GTX 780
Compute Capability: 3.5
asyncEngineCount: 1
Total Global Memory (kB): 6442254
Max Device Memory Pitch (kB): 2147483
Max Grid Size (2147483647, 65535, 65535)
Max Block Size (1024, 1024, 64)
Device Number: 1
Device name: GeForce GTX 780
Compute Capability: 3.5
asyncEngineCount: 1
Total Global Memory (kB): 6438977
Max Device Memory Pitch (kB): 2147483
Max Grid Size (2147483647, 65535, 65535)
Max Block Size (1024, 1024, 64)
Total Input Size: 20.000 (MB), GPU Size: 20.000 (MB), Compute Chunk: 10.000 (MB), Total Array Size: 5000000, GPU Array Size: 5000000
Generating random number sets...
Number sets complete.
N Chunks: 2, Chunk Buffer Size: 10000000 (B)
Error (122): invalid argument
Error (122): invalid argument
100.00% complete
Testing 20 random entries for correctness...
Entry 1710655 -> 0.1365 + 0.8665 = 0.0000 ? 1.0030
Entry 2741888 -> 0.9434 + 0.0271 = 0.0000 ? 0.9705
Entry 4635666 -> 0.5155 + 0.5276 = 0.0000 ? 1.0431
Entry 1542590 -> 0.9620 + 0.3312 = 0.0000 ? 1.2932
Entry 1334456 -> 0.2502 + 0.6925 = 0.0000 ? 0.9428
Entry 2268829 -> 0.9961 + 0.7717 = 0.0000 ? 1.7678
Entry 3117315 -> 0.8368 + 0.9625 = 0.0000 ? 1.7992
Entry 2122969 -> 0.1726 + 0.2433 = 0.0000 ? 0.4159
Entry 4495006 -> 0.1719 + 0.6689 = 0.0000 ? 0.8408
Entry 201033 -> 0.5413 + 0.3923 = -202167757261742536184553579808544522240.0000 ? 0.9336
Entry 4862600 -> 0.8823 + 0.2407 = 0.0000 ? 1.1230
Entry 1214772 -> 0.7891 + 0.8005 = 0.0000 ? 1.5896
Entry 3072111 -> 0.5847 + 0.6357 = 0.0000 ? 1.2204
Entry 3261796 -> 0.6412 + 0.0355 = 0.0000 ? 0.6766
Entry 2621388 -> 0.8425 + 0.7116 = -nan ? 1.5541
Entry 1716700 -> 0.3450 + 0.3062 = 0.0000 ? 0.6513
Entry 4228020 -> 0.8023 + 0.3822 = 0.0000 ? 1.1845
Entry 3159745 -> 0.3746 + 0.1364 = 0.0000 ? 0.5110
Entry 4424746 -> 0.0623 + 0.6677 = 0.0000 ? 0.7299
Entry 714638 -> 0.4983 + 0.0809 = 0.0000 ? 0.5793
To my understanding, this should work no problem. My kernel is contained in a separate file, but I seem to have linked everything properly (c++ wrapper for the kernel, etc) and everything compiles without warning or error. Here is the compiler output you should see:
rm -f program lib/main.o lib/summer.o lib/program.so
rm -f -rf lib
g++ -Wall -ansi -pedantic -fPIC -std=c++11 -c main.cc -o lib/main.o -pthread -std=c++11
nvcc -c summer.cu -o lib/summer.o
g++ -Wall -ansi -pedantic -fPIC -std=c++11 -o program lib/main.o lib/summer.o -L /usr/local/cuda/lib64 -lcudart -pthread -std=c++11
I appreciate any help. Thanks.