Errors when using memory debugger when testing Thrust::sort_by_key or CUB radix sort

Hello

I get the following error when using memory debugger when testing Thrust::sort_by_key or CUB radix sort. DeviceRadixSortDownsweepKernel is used in CUB as well as thrust.
I have tested with CUDA 8.0.44 and 8.0.61 on Pascal Titan X and can reproduce the error bellow.
However when using CUDA 7.5 and Maxwell Titan X there is no error.

CUDA context created : 13be844c3e0
CUDA module loaded:   13be85c1b00 radixSortThrust.cu
Internal debugger error occurred while attempting to launch _ZN6thrust6system4cuda6detail4cub_30DeviceRadixSortDownsweepKernelINS3_23DeviceRadixSortDispatchILb0EjjiE21PtxAltDownsweepPolicyELb0EjjiEEvPT1_S9_PT2_SB_PT3_SC_iibbNS3_13GridEvenShareISC_EE in CUcontext 0x13be844c3e0, CUmodule 0x13be85c1b00:
code patching failed for unknown reason.
All breakpoints for function _ZN6thrust6system4cuda6detail4cub_30DeviceRadixSortDownsweepKernelINS3_23DeviceRadixSortDispatchILb0EjjiE21PtxAltDownsweepPolicyELb0EjjiEEvPT1_S9_PT2_SB_PT3_SC_iibbNS3_13GridEvenShareISC_EE have been removed.
See Output View for additional messages of this type.

I’m not sure what the error means and the error seems to be unsure too. I tried increasing the code patching up to 32x just as a test and nothing changed.

Currently my larger code is a bit unstable, and do to the error above it is more difficult to test my code using the memory debugging feature (main system has Pascal GPU only). Currently my best option seems to find or code a different GPU sort.

Any help or suggestions for the above are would be appreciated. Also if anyone knows a good key-value GPU sort that could also be a solution.

Thanks

This very same error is happening to me when using CUB InclusiveSum and trying to make use of memory checker with both a GTX1070 in a desktop computer and 960M in a laptop:

CUDA module loaded:   1ca9e7ba940 kernel.cu.obj 
Internal debugger error occurred while attempting to launch _ZN3cub16DeviceScanKernelINS_12DispatchScanIPiS2_NS_3SumENS_8NullTypeEiE18PtxAgentScanPolicyES2_S2_NS_13ScanTileStateIiLb1EEES3_S4_iEEvT0_T1_T2_iT3_T4_T5_ in CUcontext 0x1ca8f1433c0, CUmodule 0x1ca9e7ba940:
code patching failed for unknown reason.
All breakpoints for function _ZN3cub16DeviceScanKernelINS_12DispatchScanIPiS2_NS_3SumENS_8NullTypeEiE18PtxAgentScanPolicyES2_S2_NS_13ScanTileStateIiLb1EEES3_S4_iEEvT0_T1_T2_iT3_T4_T5_ have been removed.

Any insights? Does anyone know how to fix this?

Best regards and thanks in advance to all for your help :)

Internal errors in NVIDIA tools should most likely be reported as bugs at developer.nvidia.com