Possible to use IBM Rational PurifyPlus's "Quantify" against a CUDA program?

I used to run “Quantify” against Non-CUDA C++ codes, which will show detailed line by line run time, and is quite helpful.
Anyone tried this for CUDA involved codes? I tried but it will not run properly.