Slow CUDA programs' startup

Hello All,

I’m experiencing a problem with running CUDA programs on linux system (x64 Tesla T10). Every program (even SDK samples) takes about 2-4 sec to run first CUDA command (init, sometimes memory allocation, etc.).
I guess CUDA runtime compiles PTX code for T10 architecture, but I’ve tried to include -arch and -code options to my nvcc command line and it didn’t help (googling an answer didn’t help either).

The problem gets annoying when I try to use 4 GPUs, because it takes about 12 sec to init all of them.
What’s more interesting: initializing one GPU slows down the others (memory allocation takes about 1,5 sec on each GPU and I assume it should be done in parallel, or shouldn’t?).

Thanks for any help/hints!

Hello All,

I’m experiencing a problem with running CUDA programs on linux system (x64 Tesla T10). Every program (even SDK samples) takes about 2-4 sec to run first CUDA command (init, sometimes memory allocation, etc.).
I guess CUDA runtime compiles PTX code for T10 architecture, but I’ve tried to include -arch and -code options to my nvcc command line and it didn’t help (googling an answer didn’t help either).

The problem gets annoying when I try to use 4 GPUs, because it takes about 12 sec to init all of them.
What’s more interesting: initializing one GPU slows down the others (memory allocation takes about 1,5 sec on each GPU and I assume it should be done in parallel, or shouldn’t?).

Thanks for any help/hints!

this elusive problem was discussed in one place only somewere, some time ago.

I have this problem only on gtx295 (3 card system). As superuser I have to do

nvidia-smi -l --interval=59 -f /var/log/nvidia-smi.log &

after every reboot to remove the annoying delay. I forgot if the file in that command line needs to exist

when done first time, create something that can be overwritten if so. 59 is my choice of # of seconds to re-run the smi utility (it’s part of the SDK or toolkit).

hope it works for you

this elusive problem was discussed in one place only somewere, some time ago.

I have this problem only on gtx295 (3 card system). As superuser I have to do

nvidia-smi -l --interval=59 -f /var/log/nvidia-smi.log &

after every reboot to remove the annoying delay. I forgot if the file in that command line needs to exist

when done first time, create something that can be overwritten if so. 59 is my choice of # of seconds to re-run the smi utility (it’s part of the SDK or toolkit).

hope it works for you

Thanks for the reminder about this trick! After updating the kernel on my Ubuntu 10.04 system last week, I also started seeing these very slow CUDA initialization times. Running deviceQuery required 4 seconds, but with nvidia-smi running in the background, it only takes 0.03 seconds.

Thanks for the reminder about this trick! After updating the kernel on my Ubuntu 10.04 system last week, I also started seeing these very slow CUDA initialization times. Running deviceQuery required 4 seconds, but with nvidia-smi running in the background, it only takes 0.03 seconds.

Yes it does work for me.

Thank you very much!

I had also a problem with initializaiton of SDK’s radixSort modules, but I’m replacing it with faster sort algorithm…

I have the same problem with 2 C2050 cards. A source which runs on my Q600 in 500ms needs 5sec on the Tesla Card. I tried:

[code]

nvidia-smi -l --interval=59 -f /var/log/nvidia-smi.log &

[\code]

but the --interval seems no valid options (toolkit 4.0). Are there any updates to this method?

Use the new persistent mode:

nvidia-smi -pm 1

First i thought this worked, but i still have a delay of a few secondes at the beginning. Persistent Mode is enabled, the command shows that the cards are still in permament mode.

It is independent from the commands used, the first command in the source has this delay. Is there anything I can do, like deinitialize the cards at the end of my source or something like this? Any other ideas?

System consists of 2x Tesla C2050

EDIT: The problem appears evenif i launch the programm twice (the second immediately after the first).

I have the same problem with 2x Tesla C2075, driver version is 290.10 and CUDA 4.1 toolkit.

Setting persistant mode enhance deviceQuery test, but not any other test using CUDA kernel. I have a delay of 2 or 3 sec when I launch my kernel test.

Any ideas ?

Thank you

Romain