cudaMemcpy - consuming lot of execution time

I am using cudaMemcpy to copy an image to cuda kernel for execution and it is consuming around 78% of the API calls time. I checked it using nvprof. Is there a way I can reduce this overhead or is there any other API to execute my application faster.