using cdpAdvancedQuicksort with streams

We’ve been trying to use cdpAdvanceQuicksort with streams. From the nvvp, I can see that all the streams are not working in parallel. Is it possible to overlap data transfer with qsort_warp kernel?