cuda program stuck in LEA instruction

Hi,
I face a very very very tough problem, and I hope this forum could give me some ideas.
In my project, I generate the nvvm IR code, then use libnvvm the compile the IR code into PTX code, then I do the kernel launch with the PTX code.
By using CUDA-GDB, I find that my kernel function is stuck in this place:
0x00000000024c3160 <+1440>:
0x00000000024c3168 <+1448>: @!P0 BRA 0x618
=> 0x00000000024c3170 <+1456>: LEA R4.CC, R2.reuse, 0x8
0x00000000024c3178 <+1464>: LEA.HI.X P1, R5, R2, RZ, R3
0x00000000024c3180 <+1472>:
0x00000000024c3188 <+1480>: PSETP.AND.AND P0, PT, PT, PT, PT
0x00000000024c3190 <+1488>: SHR R7, R0, 0x1f
0x00000000024c3198 <+1496>: MOV R6, R0
0x00000000024c31a0 <+1504>:
0x00000000024c31a8 <+1512>: MOV32I R2, 0x2
0x00000000024c31b0 <+1520>: { MOV32I R3, 0x0
0x00000000024c31b8 <+1528>: ST.E.64 [R4], R6, P1
0x00000000024c31c0 <+1536>:
0x00000000024c31c8 <+1544>: ST.E.64 [R4±0x8], R2, P1
0x00000000024c31d0 <+1552>: SYNC
0x00000000024c31d8 <+1560>: { LEA R4.CC, R2.reuse, RZ
0x00000000024c31e0 <+1568>: Cannot disassemble instruction
0x00000000024c31e4 <+1572>: Cannot disassemble instruction
0x00000000024c31e8 <+1576>: SSY 0x670
the value of registers R2 and R4 are:
(cuda-gdb) info registers $R4 $R2
R4 0x0 0
R2 0x63dbf0 6544368
my GPU device is : Device 0: “GeForce GTX 1050 Ti”
In the websit : CUDA Binary Utilities :: CUDA Toolkit Documentation
I find that LEA means : Compute Effective Address
but, I still don’t know what LEA R4.CC, R2.reuse, 0x8 means, and why my program stuck in there.
looking forward to any reply.
Thanks.