[RESOLVED] Profiling error 4168:999

ruddyscent · December 30, 2017, 10:28am

I could bypass the problem by executing nvprof on root permission using sudo. Maybe nvprof requires more permission rather than nvidia-persistenced group. I’m using Ubuntu 16.04.3, CUDA 9.1 on GeForce GTX Titan(Kepler).

stanislaw.warych · February 27, 2018, 6:53am

Dear NVIDIA community, this is my first post here se let me say hello to all of you!

I have the same issue: Internal profiling error 4168:999
Windows 10, driver 388.19, CUDA Toolkit 9.1.85.

However profiling of my app was running fine (the same system, driver and toolkit) until I’ve started to use cudaMemcpy2DToArrayAsync (sync or async version doesn’t matter). With just commenting out this one function profiling is going back to normal. Adding / removing it doesn’t affect correctness of standalone execution of the app. Any suggestions from your side?

veraj · February 27, 2018, 8:23am

Hi, stanislaw.warych

Please cuda-memcheck XXX.exe to check if your app has memory issue.

Thanks!

If possible, can you share the minimum program that can reproduce the problem ?

stanislaw.warych · February 27, 2018, 6:57pm

Hi,

Thanks for your answer! “cuda-memcheck” done, nothing found except cudaFree / cudaFreeHost mismatch at application end.

Minimum program is challenging to be minimum as it is CUDA <-> D3D interoperability and only with this functionality I have problems. However I made sample code creating D3D texture, mapping that to CUDA array and copying into that content of some GPU memory filled with memset. Memcheck is clean, profiler returns: “Internal profiling error 4047:999”. Code is different but again if I comment out cudaMemcpy2DToArray it runs well. Quick Google check didn’t tell me anything about this error code. Shall I post this sample here (almost 150 lines)?

veraj · February 28, 2018, 2:48am

Hi,

That would be great if you can provide the sample code.

I will send you private message about how to upload.

Thanks!

veraj · March 1, 2018, 3:51am

Hi, stanislaw.warych

I can reproduce your issue and will submit an internal bug to dev.

Thanks for your help.

Any response, I will back to you ASAP.

bernhardh · April 4, 2018, 8:47am

Hi,

so anything new on this issue (internal profiling error 4168:999)?

i still run into this problem.

bernhard

veraj · April 6, 2018, 3:03am

Hi, bernhardh

Are you meeting this error also due to cudaMemcpy2DToArray used in your code ?

For this issue, we have reproduced, and dev will try to fix in later release.

bernhardh · April 6, 2018, 6:43am

Hi,

yes i am using cudaMemcpy2DToArray in my code.

is there a workaround besides dropping back to an older version?

as it seems its still working to profile with 8.0

i can not drop cudaMemcpy2DToArray as i need interop between CUDA and OpenGL

veraj · June 19, 2018, 3:33am

Hi,

We have verified this issue already fixed in latest version.
But I’m afraid you still need wait some time to get this.

roastam · June 28, 2018, 4:23pm

Has this been fixed in the 9.2 release? If not, any eta on the fix?

thanks

veraj · July 2, 2018, 2:40am

Hi，roastam

I’m afraid 9.2 still can reproduce it.

The next release can not reproduce, but I am not sure the release data.

george.menhorn · January 29, 2019, 5:23pm

I am still seeing this.

02:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX 980 Ti] (rev a1)

Linux 3.10.0-862.2.3.el7.x86_64

nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2018 NVIDIA Corporation
Release version 10.0.130 (21)

==12451== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel “xxxxxxx(unsigned int*, unsigned int*, unsigned int*, unsigned int, unsigned int*, unsigned int*, int, unsigned int*, unsigned int*)” (19 of 52)…
2 internal events
==12451== Error: Internal profiling error 4183:999.
Kernel launch failed: unknown error
Total runtime seconds : 17.698425

–
George

toddwz · June 5, 2019, 10:56am

This helps. Thanks. I’m running Windows 10, driver 425.25, CUDA 10.1. Have to run as administrator to get tiemline even for a CUDA sample.

rnakhate · September 19, 2020, 5:36am

This solution works for my setup too.
Version: 11.0 - Quadro M1200 - Windows 10