Now, I’m using CUDA 7 ON CentOS6.7 for images processing.
I implemented as follow.
so, I think that CUDA Kernel method errors occurs.
the process don’t stop, but,
I can see such behavior that CUDA Kernel method didn’t move .
--------loop 1~3--------------------------------------------
-
On the main process, create 3 threads.
-
On the 1 CPU thread process 1 image.(that is 3 images processed at the same time)
2-1. Execute cudaMalloc
2-2. Execute cudaMemcpy (cudaMemcpyHostToDevice)
2-3. CUDA dim3 blocks(1,1,1)
2-4. CUDA dim3 threads(8,8,1)
2-5. Execute CUDA Kernel method.
2-6. waiting CUDA Kernel method finished (chudaThreadSynchronize)
2-7. Execute cudaMemcpy (cudaMemcpyDeviceToHost) -
waiting finish the 3 CPU threads.
I tried that the 3 CPU threads ⇒1 CPU thread (this version, didn’t stop the Kernel).
I think that the each thread execute dim3 blocks and dim3 threads
causes this problem.
for the resolution of the above probrems,
I think that before start CPU threads, execute “dim3 blocks” and “dim3 threads”.
and , I want to know the way to use CUDA blocks as follow.
At the CPU thread 1 ⇒ At the GPU use block 1
At the CPU thread 2 ⇒ At the GPU use block 2
At the CPU thread 3 ⇒ At the GPU use block 3
In this way, I want to select the blockID at the host side and then execute at the Device.
please tell me how to use that Cuda.
global void
kernelFunction(int* inA, int* inB, int* inC)
{
int x = threadIdx.x;
int y = threadIdx.y;
int z = threadIdx.z;
:
:
}
void A_CPU_Thread ()
{
dim3 grid(1,1,1);
dim3 block(3, 1, 1);
dim3 thread(8,8,1);
kernelFunction<<<4,512>>>(A,B);//← I want to select the block ID at host side
// A_CPU_Thread 1 ⇒ block ID 1
// A_CPU_Thread 2 ⇒ block ID 2
// A_CPU_Thread 3 ⇒ block ID 3
:
:
:
}
so, the “A_CPU_Thread” is existed 3time at the same time in main method.
I want to select the block ID at host side
A_CPU_Thread 1 ⇒ block ID 1
A_CPU_Thread 2 ⇒ block ID 2
A_CPU_Thread 3 ⇒ block ID 3