Time stamping kernel activity - To analyze the kernel operation
Rough workflow of my code -
-Each thread is in a continuous while loop - persistent thread
-There is a global worklist - threads read work units from this worklist - process them - may or maynot add new work back to the worklist.
- This producer-consumer cycle goes on till there is no work in the list AND all threads are also idle (this is the convergence/exit condition)

Now, there are times when threads/thread blocks are waiting for other threads to produce some work (other threads are busy working)
I wish to time-stamp this wait - or in some way know for how much time a thread had to wait before new work was added to the list and It was able to read that new work.

Are there any suggestions for this?

Basically I wanna have my own timing function through which i can time such waits in the kernel.

Thanks
Sid.
Rough workflow of my code -

-Each thread is in a continuous while loop - persistent thread

-There is a global worklist - threads read work units from this worklist - process them - may or maynot add new work back to the worklist.

- This producer-consumer cycle goes on till there is no work in the list AND all threads are also idle (this is the convergence/exit condition)



Now, there are times when threads/thread blocks are waiting for other threads to produce some work (other threads are busy working)

I wish to time-stamp this wait - or in some way know for how much time a thread had to wait before new work was added to the list and It was able to read that new work.



Are there any suggestions for this?



Basically I wanna have my own timing function through which i can time such waits in the kernel.



Thanks

Sid.

#1
Posted 04/19/2012 07:29 AM   
So, what's the problem? Use clock()?
So, what's the problem? Use clock()?

#2
Posted 04/19/2012 10:33 AM   
[quote name='vvolkov' date='19 April 2012 - 03:33 AM' timestamp='1334831592' post='1398250']
So, what's the problem? Use clock()?
[/quote]

I believe clock() has a 32 bit return value. If you must be sure, you can use inline ptx to get the 64 bit counter too.
[quote name='vvolkov' date='19 April 2012 - 03:33 AM' timestamp='1334831592' post='1398250']

So, what's the problem? Use clock()?





I believe clock() has a 32 bit return value. If you must be sure, you can use inline ptx to get the 64 bit counter too.

#3
Posted 04/19/2012 11:56 AM   
[quote name='RezaRob3' date='19 April 2012 - 12:56 PM' timestamp='1334836596' post='1398272']
I believe clock() has a 32 bit return value. If you must be sure, you can use inline ptx to get the 64 bit counter too.
[/quote]
Or use [font="Courier New"]clock64()[/font]?
[quote name='RezaRob3' date='19 April 2012 - 12:56 PM' timestamp='1334836596' post='1398272']

I believe clock() has a 32 bit return value. If you must be sure, you can use inline ptx to get the 64 bit counter too.



Or use clock64()?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#4
Posted 04/19/2012 04:12 PM   
thanks guys. I will look into it.
thanks guys. I will look into it.

#5
Posted 04/19/2012 05:41 PM   
[quote name='tera' date='19 April 2012 - 09:12 AM' timestamp='1334851936' post='1398359']
Or use [font="Courier New"]clock64()[/font]?
[/quote]

Ah! Beautiful! Thanks tera. I could swear I didn't see that in one of the previous versions of the guide, but it's there in 4.2. :)
[quote name='tera' date='19 April 2012 - 09:12 AM' timestamp='1334851936' post='1398359']

Or use clock64()?





Ah! Beautiful! Thanks tera. I could swear I didn't see that in one of the previous versions of the guide, but it's there in 4.2. :)

#6
Posted 04/19/2012 07:12 PM   
Scroll To Top