How to declare an external cuda function in C++

Hey guys,
I have a little problems with some cuda functions.
I want to separate the .cu file and .cpp and in one sample from Nvidia is a strange function.
The definition look like this :

template <int BLOCK_SIZE> __global__ void
matrixMulCUDA(float *C, float *A, float *B, int wA, int wB)

And i have declare in C++ something like this

template <int BLOCK_SIZE> extern void 
matrixMulCUDA(float *C, float *A, float *B, int wA, int wB);

On the declaration doesn’t give me any error but when it used like this

if (block_size == 16)
    {
        matrixMulCUDA<16> << < grid, threads >> >(d_C, d_A, d_B, dimsA.x, dimsB.x);
    }
    else
    {
        matrixMulCUDA<32> << < grid, threads >> >(d_C, d_A, d_B, dimsA.x, dimsB.x);
    }

It tells me “syntax error ‘<’”.
Can someone tell me how should i declare the function in C++(if everything is put in .cu file runs ok) ?
Thanks :)