I’m currently working on expanding and improving FLANN’s cuda support.
I have a line such as this:
thrust::transform(thrust::system::cuda::par.on(stream), id, id + knn*queries.rows, id, map_indices(thrust::raw_pointer_cast(&((*gpu_helper_->gpu_vind_))[0])));
I’ve similarly tried this:
thrust::transform(thrust::cuda::par.on(stream), id, id + knn*queries.rows, id, map_indices(thrust::raw_pointer_cast(&((*gpu_helper_->gpu_vind_))[0])));
That doesn’t seem to be performing synchronization on the passed in stream.
If I break my execution in thrust::system::detail::cuda::bulk_::detail::synchronize, and then examine my stack, I find that async_in_stream is called from:
(async.inl: 88)
template<typename ExecutionGroup, typename Closure>
host device
future async(ExecutionGroup g, Closure c)
{
return bulk::detail::async_in_stream(g, c, 0, 0);
} // end async()
instead of:
(async.inl: 97)
template<typename ExecutionGroup, typename Closure>
host device
future async(async_launch launch, Closure c)
{
return launch.is_stream_valid() ?
bulk::detail::async_in_stream(launch.exec(), c, launch.stream(), launch.before_event()) :
bulk::detail::async(launch.exec(), c, launch.before_event());
} // end async()
Thus everything is being launched on the null stream and synchronized on it as well.
It is apparent that the intention is to execute on the given stream, but it seems to not be happening.