Hi all,
I ran into a problem trying to suppress a warning about calling a host function from a device function. I already submitted a bug, but thought I’d post here as well, since I spent a lot of time chasing this.
I’m writing some code that uses the Eigen library (Eigen). This library uses templates pretty heavily and it triggers spurious warnings about calling a host function from a host device function. And attempt to suppress this warning using
#pragma hd_warning_disable
causes nvcc to generate incorrect code.
An example is available here: GitHub - konstantin-azarov/nvcc-pragma-bug: Demonstration of nvcc bug with pragmas (the pragma is inserted here: nvcc-pragma-bug/Transform.h at eed4d35545a0662265dc1f2aa664cf84620c190c · konstantin-azarov/nvcc-pragma-bug · GitHub).
If this code is compiled without this pragma (as found in the repository, using build.sh), the SASS code looks as expected:
Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = cuda
host = linux
compile_size = 64bit
code for sm_30
Function : _Z9TransformN5Eigen9TransformIfLi3ELi2ELi0EEENS_6MatrixIfLi3ELi1ELi0ELi3ELi1EEE
.headerflags @"EF_CUDA_SM30 EF_CUDA_PTX_SM(EF_CUDA_SM30)"
/* 0x22823232028042b7 */
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x2800400110005de4 */
/*0010*/ IADD32I R1, R1, -0x18; /* 0x0bffffffa0105c02 */
/*0018*/ F2F.F64.F32 R2, c[0x0] [0x180]; /* 0x1000400601309c04 */
/*0020*/ LOP.OR R6, R1, c[0x0][0x24]; /* 0x6800400090119c43 */
/*0028*/ F2F.F64.F32 R8, c[0x0] [0x184]; /* 0x1000400611321c04 */
/*0030*/ F2F.F64.F32 R10, c[0x0] [0x188]; /* 0x1000400621329c04 */
/*0038*/ LOP32I.AND R12, R6, 0xffffff; /* 0x3803fffffc631c02 */
/* 0x22b2b042e042e047 */
/*0048*/ STL.64 [R12], R2; /* 0xc800000000c09ca5 */
/*0050*/ MOV32I R4, 0x0; /* 0x1800000000011de2 */
/*0058*/ STL.64 [R12+0x8], R8; /* 0xc800000020c21ca5 */
/*0060*/ MOV32I R5, 0x0; /* 0x1800000000015de2 */
/*0068*/ STL.64 [R12+0x10], R10; /* 0xc800000040c29ca5 */
/*0070*/ MOV R7, RZ; /* 0x28000000fc01dde4 */
/*0078*/ JCAL 0x0; /* 0x1000000000011c07 */
/* 0x20000000000002e7 */
/*0088*/ EXIT; /* 0x8000000000001de7 */
/*0090*/ BRA 0x90; /* 0x4003ffffe0001de7 */
/*0098*/ NOP; /* 0x4000000000001de4 */
/*00a0*/ NOP; /* 0x4000000000001de4 */
/*00a8*/ NOP; /* 0x4000000000001de4 */
/*00b0*/ NOP; /* 0x4000000000001de4 */
/*00b8*/ NOP; /* 0x4000000000001de4 */
..........................................................................................
If, however the pragma is inserted by uncommenting the definition in test.cu, the code comes out wrong:
Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = cuda
host = linux
compile_size = 64bit
code for sm_30
Function : _Z9TransformN5Eigen9TransformIfLi3ELi2ELi0EEENS_6MatrixIfLi3ELi1ELi0ELi3ELi1EEE
.headerflags @"EF_CUDA_SM30 EF_CUDA_PTX_SM(EF_CUDA_SM30)"
/* 0x2000000002f2f307 */
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x2800400110005de4 */
/*0010*/ BPT.TRAP 0x1; /* 0xd00000000400c007 */
/*0018*/ EXIT; /* 0x8000000000001de7 */
/*0020*/ BRA 0x20; /* 0x4003ffffe0001de7 */
/*0028*/ NOP; /* 0x4000000000001de4 */
/*0030*/ NOP; /* 0x4000000000001de4 */
/*0038*/ NOP; /* 0x4000000000001de4 */
..........................................................................................