1) cudaRegisterFatBinary(__cudaFatCudaBinary *) in 4.0 RC2

In the CUDA runtime 4.0 RC2, it seems that Nvidia has changed the structure passed to cudaRegisterFatBinary(void fatCubin). This isn’t unexpected, but it’s very different from 3.x, so I’m having difficulty identifying information I need for extracting elf, ptx, etc. for emulation, disassembly, etc. It no longer seems to be a pointer to a __cudaFatCudaBinary structure. Casting the parameter to a (__cudaFatCudaBinary) yields a structure that is mostly empty, the magic number = 0x466243b1, the version = 0x00000001, and gpuInfoVersion now seems to be a pointer to the beginning of an important structure that does seem to contain the goodies. Anyone know what’s the new structure?

You may find this thread from the Ocelot mailing list helpful:

NVIDIA Announces CUDA 4.0

Unfortunately, it seems the structures are now different between Windows (which I work on), and Ubuntu (where Ocelot lives). So, the code in Ocelot for interpreting the new format (http://gpuocelot.googlecode.com/svn/trunk/ocelot/ocelot/cuda/implementation/FatBinaryContext.cpp ) does not work on Windows. Sigh.

The fat binary format on linux seems to be a list of binary objects with this header:

typedef struct __cudaFatCudaBinary2EntryRec {

unsigned int           type;

unsigned int           binary;

unsigned int           binarySize;

unsigned int           unknown2;

unsigned int           kindOffset;

unsigned int           unknown3;

unsigned int           unknown4;

unsigned int           unknown5;

unsigned int           name;

unsigned int           nameSize;

unsigned long long int unknown6;

unsigned long long int unknown7;

} __cudaFatCudaBinary2Entry;

‘binary’ is an offset from the base of the header to the actual binary. I have seen cubins stored in ELF format (you can copy them out and dump with with objdump), and PTX assembly files being types of binaries. On windows the format may be different, but the way I figured this out was finding a binary in a format that I understood, ELF (search for the magic word), then trying to find the header by comparing multiple applications to see where the binary starts, and finally filling in the header fields in the context of the binary that I understood. Strings are also easy to pick out, especially when they correspond to the name of the program you are running.

I really wish there would be some documentation for this…