How to get the kernel binary file from OpenCL Nvidia GPU toolkit

Dear Nvidia developers,

I’m trying to get the kernel binary file from a simple OpenCL code, using offline compilation. I have a first code that loads a c++ file where kernel is written and compile it with:

hprogram = clCreateProgramWithSource(hContext, 1, (const char **)&source_str,

(const size_t *)&source_size, &ret);

ret = clBuildProgram(hprogram, 1, &devices[0], NULL, NULL, NULL);

cl_uint program_num_devices;

 clGetProgramInfo(hprogram, CL_PROGRAM_NUM_DEVICES, sizeof(cl_uint),&program_num_devices, NULL);

size_t binaries_sizes[program_num_devices];

clGetProgramInfo( hprogram, CL_PROGRAM_BINARY_SIZES,program_num_devices*sizeof(size_t),binaries_sizes, NULL);

char **binaries = new char*[program_num_devices];

for (size_t i = 0; i < program_num_devices; i++)

                                binaries[i] = new char[binaries_sizes[i]+1];

clGetProgramInfo(hprogram, CL_PROGRAM_BINARIES, program_num_devices*sizeof(size_t), binaries, NULL);

ofstream out_binary_file;

 for (size_t i = 0; i < program_num_devices; i++)

   {

     binaries[i][binaries_sizes[i]] = '

hprogram = clCreateProgramWithSource(hContext, 1, (const char **)&source_str,

(const size_t *)&source_size, &ret);

ret = clBuildProgram(hprogram, 1, &devices[0], NULL, NULL, NULL);

cl_uint program_num_devices;

clGetProgramInfo(hprogram, CL_PROGRAM_NUM_DEVICES, sizeof(cl_uint),&program_num_devices, NULL);

size_t binaries_sizes[program_num_devices];

clGetProgramInfo( hprogram, CL_PROGRAM_BINARY_SIZES,program_num_devices*sizeof(size_t),binaries_sizes, NULL);

char *binaries = new char[program_num_devices];

for (size_t i = 0; i < program_num_devices; i++)

                            binaries[i] = new char[binaries_sizes[i]+1];

clGetProgramInfo(hprogram, CL_PROGRAM_BINARIES, program_num_devices*sizeof(size_t), binaries, NULL);

ofstream out_binary_file;

for (size_t i = 0; i < program_num_devices; i++)

{

 binaries[i][binaries_sizes[i]] = '\0';

 std::cout << "Program " << i << ":" << std::endl;

 std::cout << binaries[i];

}

out_binary_file.open (“kernel.bin”);

for (size_t i = 0; i < binaries_sizes[0]; i++)

out_binary_file << binaries[0][i];

out_binary_file.close();

';

     std::cout << "Program " << i << ":" << std::endl;

     std::cout << binaries[i];

}

out_binary_file.open ("kernel.bin"); 

for (size_t i = 0; i < binaries_sizes[0]; i++)

  	out_binary_file << binaries[0][i];

out_binary_file.close();

Another main loads that kernel and launch it:

hProgram = clCreateProgramWithBinary(hContext, 1, &devices[0], (const size_t *)&binary_size,

 (const unsigned char **)&binary_buf, &binary_status, &ret);

cl_kernel hKernel;

 hKernel=clCreateKernel(hProgram,"vectorAdd", 0);

.....

Unfortunately, It doesn’t work :( The kernel written from first source code has ASCII format, it has not an executable format. But I don’t get any erro message, the compilation and load of kernel appears work well.

Someone could you help me?

Thanks in forward.

The “Binary” in “clCreateProgramWithBinary” is misleading in this case. Depending on the OpenCL platform it might actually be a binary / executable format (for AMD and Intel OpenCL platform it IMHO is), but for NVIDIA, the “binary” is in fact PTX Assembler source code. IIRC, the OpenCL specification also clarifies that “binary” might just be some intermediate / reusable format, not necessarily a binary.

Of course,

placing another

clBuildProgram(hProgram, 1, &devices[0], NULL, NULL, NULL);

in the second code snippet, it works well. I have read now in “OpenCL programming guide” that “whether a program is created from source or binary, it must always be built before it can be used.”.

It means the max we can do is to generate the PTX code one time and reuse it building each time, so avoid the first conversion source code → PTX but doing ever PTX → executable code.

But I suspect that the second phase needs more time, and I suspect ever that eventual flags passed to a compiler must be equal in first and second phase. Is it rigth?

Thanks.

I don’t know for sure, but I’d say that the front-end OpenCL → PTX step is more expensive, and that compiler flags only affect this step (so it’s irrelevant what flags you pass when building from binary). I tend to think of it like this: Building from source includes compiling and linking, building from binary only linking, and the flags passed are only compiler flags. Of course it’s not really correct here as the PTX source code needs to be compiled, too, but I think it’s a good way to think of it in general.