cuda sample using __ldg()..?

Hello,

Does one of the cuda samples use __ldg() somewhere in the code…?

I did a quick find on Windows and it doesn’t seem like any of the CUDA 5.5 or 6.0 samples use it.

Here’s some past threads that lalk about that instruction, in case they happen to answer your question:
[url]Why L1 cache hit ratio become zero on K20? - CUDA Programming and Performance - NVIDIA Developer Forums
[url]const __restrict__ read faster than __constant__ ? - CUDA Programming and Performance - NVIDIA Developer Forums
[url]ldg versus textures - CUDA Programming and Performance - NVIDIA Developer Forums

Here’s a slideshow with a small example:
[url]http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization-With-Nsight-VSE.pdf[/url]
Pretty much the same as above, maybe on different hardware? looked the same at first glance:
[url]http://calcul.math.cnrs.fr/IMG/pdf/CUDA-Optimization-Julien-Demouth.pdf[/url]

library that templates ldg instruction for other non-native types:
[url]https://github.com/BryanCatanzaro/generics[/url]