[Jetson-TK1] Can't compile in NEON instructions using NVCC

I have a problem compiling in NEON instructions with CUDA code. In short, this works:

nvcc --compile --compiler-options -mfpu=neon,-flax-vector-conversions dsp.c -o dsp.o

But this doesn’t (input file is now .cu with kernel code):

nvcc --compile --compiler-options -mfpu=neon,-flax-vector-conversions rotation.cu -o rotation.o

Output looks like this:

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(41): error: identifier "__builtin_neon_qi" is undefined

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(42): error: identifier "__builtin_neon_hi" is undefined

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(43): error: identifier "__builtin_neon_si" is undefined

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(44): error: identifier "__builtin_neon_di" is undefined

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(45): error: identifier "__builtin_neon_hf" is undefined

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(46): error: identifier "__builtin_neon_sf" is undefined

/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(47): error: identifier "__builtin_neon_poly8" is undefined

I include the <arm_neon.h> standard NEON header file, but it looks like the above identifiers are no longer defined? Any thoughts?

Actually, I guess this is because the necessary builtin defines are not used in the CUDA compilation trajectory…