Infinite loop in CUDA kernel

I wrote the CUDA kernel as follows:

global void testkernel2(float *a, float *b, float *c, float d)
{
unsigned int idx = threadIdx.x + blockDim.x
blockIdx.x;

while(1) {
a[idx] = 0;
b[idx] = 0;
c[idx] = 0;
d[idx] = 0;
};

}

But somehow, the infinite loop will not run perpetually. Is there some kind of infinite loop detector on the GPU which will terminate its execution?

I wrote the CUDA kernel as follows:

global void testkernel2(float *a, float *b, float *c, float d)
{
unsigned int idx = threadIdx.x + blockDim.x
blockIdx.x;

while(1) {
a[idx] = 0;
b[idx] = 0;
c[idx] = 0;
d[idx] = 0;
};

}

But somehow, the infinite loop will not run perpetually. Is there some kind of infinite loop detector on the GPU which will terminate its execution?

Nice :) the dead-code optimizer probably optimized it out.

Try to replace the = 0 with = threadIdx.x or some counter that you increment in the loop.

Why you want to do this BTw??? ;)

eyal

Nice :) the dead-code optimizer probably optimized it out.

Try to replace the = 0 with = threadIdx.x or some counter that you increment in the loop.

Why you want to do this BTw??? ;)

eyal

Thank you for the reply.

I managed to get it into an infinite loop.

But the machine got hanged. Just to confirm what I have read from elsewhere in this forum: the GPU is capable of asynchronous operation. But CUDA itself currently will schedule the instructions synchronously, and thus the CPU will wait till the GPU complete its execution before rescheduling other tasks (on the CPU). Correct?

Thank you for the reply.

I managed to get it into an infinite loop.

But the machine got hanged. Just to confirm what I have read from elsewhere in this forum: the GPU is capable of asynchronous operation. But CUDA itself currently will schedule the instructions synchronously, and thus the CPU will wait till the GPU complete its execution before rescheduling other tasks (on the CPU). Correct?

Watchdog timer will kick you out if you run on display GPU in windows (or) X in Linux

Watchdog timer will kick you out if you run on display GPU in windows (or) X in Linux

What happens after you launch the kernel depends on your code. The launch on the host is asynchronous, so the execution of your CPU code resumes as soon as the kernel is queued for execution. The queue depth depends on the driver (and possibly the card), but last time someone checked, it seemed to be around 24. If the queue is full, then CPU execution will not resume until a kernel finishes and there is room in the queue for the current launch to be stored.

After the kernel is queued, the CPU will continue executing your program, and you are free to do additional computation while the GPU runs in the background. CPU execution will block and wait for the GPU execution to finish if you run a CUDA operation with an implicit synchronization point. The most common one is cudaMemcpy(), which has to wait for all previously launched kernels to complete before performing a memory copy. Another is cudaThreadSynchronize(), which does nothing but wait for the previous kernels to complete.

Kernel launches also become synchronous if you have the CUDA Profiler active.

What happens after you launch the kernel depends on your code. The launch on the host is asynchronous, so the execution of your CPU code resumes as soon as the kernel is queued for execution. The queue depth depends on the driver (and possibly the card), but last time someone checked, it seemed to be around 24. If the queue is full, then CPU execution will not resume until a kernel finishes and there is room in the queue for the current launch to be stored.

After the kernel is queued, the CPU will continue executing your program, and you are free to do additional computation while the GPU runs in the background. CPU execution will block and wait for the GPU execution to finish if you run a CUDA operation with an implicit synchronization point. The most common one is cudaMemcpy(), which has to wait for all previously launched kernels to complete before performing a memory copy. Another is cudaThreadSynchronize(), which does nothing but wait for the previous kernels to complete.

Kernel launches also become synchronous if you have the CUDA Profiler active.

Also if the GPU is executing a kernel that’s hung on an infinite loop, it cannot update the display anymore so the machine appears hung though the CPU may still be running, and shelling in might work. On Linux and WinXP, pressing ctrl-c will terminate the running kernel.

Also if the GPU is executing a kernel that’s hung on an infinite loop, it cannot update the display anymore so the machine appears hung though the CPU may still be running, and shelling in might work. On Linux and WinXP, pressing ctrl-c will terminate the running kernel.