I have a problem compiling in NEON instructions with CUDA code. In short, this works:
nvcc --compile --compiler-options -mfpu=neon,-flax-vector-conversions dsp.c -o dsp.o
But this doesn’t (input file is now .cu with kernel code):
nvcc --compile --compiler-options -mfpu=neon,-flax-vector-conversions rotation.cu -o rotation.o
Output looks like this:
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(41): error: identifier "__builtin_neon_qi" is undefined
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(42): error: identifier "__builtin_neon_hi" is undefined
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(43): error: identifier "__builtin_neon_si" is undefined
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(44): error: identifier "__builtin_neon_di" is undefined
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(45): error: identifier "__builtin_neon_hf" is undefined
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(46): error: identifier "__builtin_neon_sf" is undefined
/usr/lib/gcc/arm-linux-gnueabihf/4.8/include/arm_neon.h(47): error: identifier "__builtin_neon_poly8" is undefined
I include the <arm_neon.h> standard NEON header file, but it looks like the above identifiers are no longer defined? Any thoughts?