function parameter vs constant memory

I have some kernels,they have same parameter like width,height,pitch,addr make parameter list very long.
to reduce kernel function parameter count,can i put them to constant memory.how to do it?
thanks to any advice.

I have some kernels,they have same parameter like width,height,pitch,addr make parameter list very long.
to reduce kernel function parameter count,can i put them to constant memory.how to do it?
thanks to any advice.

You can copy into constant memory from the host using cudaMemcpyToSymbol.

You can copy into constant memory from the host using cudaMemcpyToSymbol.

Is there any performance gain from doing this?

I would guess not. “Constant” memory has its own issues.

Anyone tried it?

Bill

I am. And I am deeply troubled.

With NSight, I can see that variables inside kernel can not load data

from constant memory properly.

See, I have

constant double sf;

constant double a[16];

and value of “sf” can be loaded properly,

while loaded data of a[0]~a[15] are always 0.

Although I am sure that values saved in a are correct.

Donno why.

very very annoying.

Hmm, I’ve not come across that particular problem.

I was wondering if anyone had tried using constant memory in place of kernel arguments and if it really did give any benefit.

Your problem does seem similar to the fact that shared data actually all start at the same place so that it you declare multiple shared variables they all act as a single one. I did not think that constant memory also had this problem. (plenty of others, see Debugging CUDA, W. B. Langdon. CIGPU 2011, Sect 4.4, p421 William Langdon 2011 Abstracts (except Genetic Programming) )

Bill

ps: how do you know a[0 … 15] are not 0.0 ?

Thanks。

About the values in a[0…15], I can check it with NSight.

 I guess CUDA still has a lot problems....

Hmm I would be tempted to see if its related to having multiple variables by getting rid of your //constant double sf;

I have written several programs that created a “constant vector”, an
array of floats (or sometimes float4s) that contained data or parameters
that did not change very often. It is not clear that it had any effect
on performance, but it can be hard to tell. It did greatly complicate
debugging however.

For a program coming up, I am thinking of putting the ‘constant vector’
in its own texture map and see if that makes a difference.

I was hoping that someone on this thread would be able to tell me that
they did rigorous tests and indeed it did help or it didnt. Guess not.

MW

There are definite problems with constant memory. For example, it really

does matter how many threads in a warp try to read different words of

the constant memory simultaneously. Essentially if its more than one,

the threads are forced to run sequentially, whereas if they all read

the same word they all do it in parallel. This and other issues are described in

“Debugging CUDA”, CIGPU 2011, Section 4.4 p421

http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/debug_cuda.pdf

Bill
debug_cuda.pdf (191 KB)