It’s impossible to make recommendations based on two isolated lines of code. If code disappears in release builds, that is usually due to dead code elimination. This is a widely used optimization technique in compilers that eliminates code that does not contribute to externally visible program state.
In the context of CUDA, that typically means the code does not contribute to modifications of global memory that are observable at kernel termination time. The CUDA compiler is highly optimizing, and may find “dead” code even in places where it is not obvious to a human at first glance. I have never encountered a situation where the CUDA compiler incorrectly marked code as “dead”. That does not mean such a bug is impossible, but the answer to your question is most likely to be found in your code, which you haven’t shown.
Note that violations of the CUDA programming model, as well as violations of the underlying C++ programming model, can lead to code elimination. As soon as undefined behavior is invoked, anything can happen (back in the days, the C community referred to “nasal daemons” as one possible outcome of undefined behavior).
It is usually not a good idea to paper over problematic code by tossing in a few ‘volatile’ modifiers. It would be better to understand what is going on with the OP’s code and to determine what might be an appropriate way to achieve the desired functionality in CUDA.
when i wrote several MTF implementations in CUDA, i found that some of them work faster with volatile, while other were faster with syncThreads. so it depends
Maybe less than 200 calculations are just not enough…
The major performance hit is the compiler misinterpreted my logic and creates some huge memory dependency stall, though. Guess I will have to try pure PTX now.
Out of curiosity, what makes you think it’s just the compiler that’s breaking your code? Debug vs Release might actually be revealing undefined behavior present in your code. Or was this determined using an assembly dump and inspecting to make sure the instructions were actually removed?
I only ask because it seems unlikely that a decently-written compiler like this would remove such critical instructions.