Cupti callback version

I’m using the cudatoolkit version 6.5.14. I have a simple tracer using CPUTI that intercepts call to cudaMemcpyAsync. The problem is that the CUpti_CallbackId that I retrieve is always of type CUPTI_RUNTIME_TRACE_CBID_cudaMemcpyAsync_v3020. Both the targeted cuda code and the Cupti tracer are compiled with the same cudatoolkit 6.5.14. I would like to have the v4000 version to be able to identify the source and destination devices of the cudaMemcpyAsync being intercepted. Should I use a different cuda toolkit?

Thanks!