Hey,
thanks for your response.
To avoid any confusion, I didn’t mean that Cuda will be slower than OpenCL. If you look at all the performance tests and examples throughout the web, Cuda in most cases comes out as the fastest of the two. But for most examples I found on the internet, OpenCL still reaches at least 80% of the Cuda performance and will not be 4-5 times slower. So by logical deduction writing the same Kernel in Cuda will also give me the 4-5 times slower performance compared to CG-fragment code. That is, if the two example kernels are really the same and if everything I initialised and coded on the host side is optimal and correct. And the latter, I’m sure is not the case, as I’m not an expert at all when it comes to parallel programming and boosting performance.
I did try to run the CG-shaders on several AMD/ATI cards but I received a lot of issues concerning the use of for loops, control statements, profiles that were not supported and so on. Because I apparently didn’t have the skills to solve these issues, I tried to use GLSL directly instead, because, as you mentioned, GLSL shaders are created from the CG shaders when used on ATI cards. But for more elaborate algorythms, the GLSL versions didn’t perform ad good as the original CG-shaders.
But then again, there are probably some basic considerations or maybe small details that I’m overlooking in my openCL code (and back then, in my GLSL code), so I have put some host code below as you mentioned (I left out the error handling code). I hope this is what you meant.
The member variable mOCLObject represents a class that builds up all the necessary OpenCL initialization stuff like platforms, devices, contexts, command queues, programs, and so on. The variable Images[0] and Images[1] were successfully created from the command clCreateFromGLTexture2D(…). Maybe it’s also worthwhile to mention that I used all possible combinations for the local size of the work groups. The maximum I could use on my Nvidia card was 512 in total, so I tried (512, 1), (256, 2), … (1, 512) and smaller sizes.
OPENCL
[i]cl_kernel kernel;
cl_int errorcode;
kernel = clCreateKernel(*mOCLObject.mProgram, “InvertGrayScaleImage”, &errorcode);
//Set the arguments of the kernel function, which are just the input and output openCL images
errorcode = clSetKernelArg(kernel, 0, sizeof(cl_mem), Images[0]);
errorcode = clSetKernelArg(kernel, 1, sizeof(cl_mem), Images[1]);
size_t offset[2] = {0,0};
size_t global_size[2] = {2048, 1920};
size_t local_size[2] = {16, 16};
cl_event kernel_event;
//End all running openGl commands
glFinish();
//give OpenCL exclusive access to the device’s memory
errorcode = clEnqueueAcquireGLObjects(mOCLObject.mComQueues[0], NumOfTextures, Images, 0, NULL, NULL);
//execute the “InvertGrayImage” OpenCL kernel
errorcode = clEnqueueNDRangeKernel(mOCLObject.mComQueues[0], kernel, 2, offset, global_size, local_size, 0, NULL, &kernel_event);
//release the exclusive OpenCL access
errorcode = clEnqueueReleaseGLObjects(mOCLObject.mComQueues[0], 2, Images, 0, NULL, NULL);
//finish all OpenCL commands, because we need to use OpenGL again to draw the results.
clFinish(mOCLObject.mComQueues[0]);
//profile the timings of the kernel to calculate the Frames per second
cl_ulong start, end;
clWaitForEvents(1, &kernel_event);
clGetEventProfilingInfo(kernel_event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
clGetEventProfilingInfo(kernel_event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
//end en start are expressed in ns, so we have to multiply it by 1.0e-9 to put it in seconds
float FramesPerSec = 1.f/((end-start)*1.0e-9);[/i]
Below you can see some CG-shader host code. mCGob is an Object from a class that successfully performed all the initialisations and compilations of the vertex and fragment programs.
This class was also involved in the determination of the optimal vertex and fragment profiles.
CG-host program:[u][/u]
[i]glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0.0, 1.0, 0.0, 1.0, 0.01, 1000.0);
//activate the FBO where we have to render the output to.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, mFBO[19]);
//The output texture is coupled to color attachment zero of the frame buffer object 19
glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
//Set the fragment profile for this shader, which is the 37th shader of an array of fragment shaders!
cgGLEnableProfile(mCGob->mFragmentProfile);
//Set the input texture as argument to the shader. mTexID[0] contains the input image
cgGLSetTextureParameter(mCGob->mFragments[37].mParameters[0], mTexID[0]);
cgGLEnableTextureParameter(mCGob->mFragments[37].mParameters[0]);
//update the parameters of fragment shader 37
cgUpdateProgramParameters(mCGob->mFragments[37].mProg);
//Load and bind the shader to the following OpenGL glBegin-glEnd instruction set
cgGLLoadProgram(mCGob->mFragments[37].mProg);
cgGLBindProgram(mCGob->mFragments[37].mProg);
//Set the viewport to match the resolution of the image
glViewport(0, 0, 2048, 1920);
glEnable(GL_TEXTURE_2D);
glBegin(GL_QUADS);
glTexCoord2f(0.0, 1.0);
glVertex2f(0.0, 1.0);
glTexCoord2f(1.0, 1.0);
glVertex2f(1.0, 1.0);
glTexCoord2f(1.0, 0.0);
glVertex2f(1.0, 0.0);
glTexCoord2f(0.0, 0.0);
glVertex2f(0.0, 0.0);
glEnd();
glDisable(GL_TEXTURE_2D);
//Disable the texture parameters
cgGLDisableTextureParameter(mCGob->mFragments[37].mParameters[0]);
//Disable the fragmentprofile.
cgGLDisableProfile(mCGob->mFragmentProfile);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); //Activate the regular window frame buffer again to display results on the screen.[/i]