What is CUT_BANK_CHECKER

I see this method in alot of code but am not sure what it does. I can’t the documentation on it either.

This is a simple macro that works with some host code when running in emulation mode to help check for bank conflicts in shared memory.

The way it works is you use the CUT_BANK_CHECKER macro when accessing shared memory, the code tracks these accesses, and then you call cutCheckBankAccess which dumps the bank conflict information.

The code is in “common/inc/bank_checker.h”.

Where are these macros documented? I don’t see them in the programming guide.

They’re not currently documented. They’re not really part of CUDA, they’re part of the CUT utility library whose code ships with the SDK.

Could you give more information about how to call this cutCheckBankAccess? I don’t understand how the handles work for the classes in bank_checker.h.

More specifically, I’m concerned that my current code is being limited by Bank Conflicts, and i’m not sure I understand quite how a call to store shared memory is translated into a call to a specific memory bank. Thus, I would like to use something like cutBankChecker to determine if my code is causing bank conflicts. Bank_Checker.cpp has the input parameters of BankChecker::access(), but this involves “file: name of the source file where the access takes place” and “line: line in the source file where the access takes place”. What file? something like “test_kernel.cu”?

My kernel is assigning data from global memory to shared memory via calls like

// only threads in the x-direction in this example, only blocks in the x-direction.

for(i=1,i<16;i++)

{

As[threadindex][i] = data[INDEX(threadindex,blockindex,i)];

}

Beyond lots of benchmarking (which i’m attempting to do at the moment), is this sort of assignment going to cause bank conflicts?

If your shared array has 16 columns, then yes, you are having 16-way bank conflicts

To use bank-checker you’ll have compile and run in device emulation mode. Rather than using array adressing (such as As), just use the macros to do the indexing.

Paulius

Could you please provide some example code how to use CUT_BANK_CHECKER exactly?

thanks and best regards,
christoph

Without bank checker:

__global__ void test(int *gpA)

{

    __shared__ int sa[16];

    sa[0]=3; // bank conflict if blocksize > 1

    gpA[0]=sa[0]; // bank conflict again

}

With bank checker:

__global__ void test(int *gpA)

{

    __shared__ int sa[16];

    CUT_BANK_CHECKER(sa,0)=3; // bank conflict if blocksize > 1

    gpA[0]=CUT_BANK_CHECKER(sa,0); // bank conflict again

}

That’s how I remember using it. Let us know if it works for you. If I remember correctly, the checker was written for 32-bit words (floats). If your shared array type is not 32 bits in size, you’ll have to adjust the macro. Also, make sure to compile for device emulation.

Paulius

Also, in my experience the bank checker is not completely reliable. It has not been heavily tested, and may miss bank conflicts or report false ones.

YMMV and all that.

In the future we plan to have a hardware profiler that can provide information like this.

Mark