When I run my kernel for smaller data size, it returns the correct results. But when I increase it, the error msg:
the lauched time out and was terminated.
So does MAC OS also set up some watchdog time for kernel executed on the devices? And the OS treat both displaying and non-displaying GPUs as the same?
How long is your kernel call executing? I read about 5s time-out in other posts, and also read about it in the official docs - but please don’t ask where :-(.
How long is your kernel call executing? I read about 5s time-out in other posts, and also read about it in the official docs - but please don’t ask where :-(.
I’m having a similar problem: The kernel runs fine with small data. However, my mac just crashed when I put more data, about 300KB (really big amount of data :). From the printings I had, the program stopped while trying to copy the data. I’m using Thrust api to copy a vector to the device. Also, I’m surrounding the copy with cuda events, and a synchronize call at the end, to time the call.
Is there any way to turn on both the GPUs, the intel for rendering normally the screen, and the Nvidia one to run the CUDA program?
I’m having a similar problem: The kernel runs fine with small data. However, my mac just crashed when I put more data, about 300KB (really big amount of data :). From the printings I had, the program stopped while trying to copy the data. I’m using Thrust api to copy a vector to the device. Also, I’m surrounding the copy with cuda events, and a synchronize call at the end, to time the call.
Is there any way to turn on both the GPUs, the intel for rendering normally the screen, and the Nvidia one to run the CUDA program?