Is it possible to use more than 4 GB of VRAM using OpenCL?

We recently purchased a Tesla k40c card because we need lots of VRAM for our problem domain. Unfortunately when we tried to utilize more than 4 GB of VRAM in a single process we get CL_MEM_OBJECT_ALLOCATION_FAILURE.

The problem is likely due to 32-bit addressing (CL_DEVICE_ADDRESS_BITS = 32). On AMD cards you can set an environment variable to change to 64 bit addressing. Does anybody have any idea how to change this value on an NVIDIA card?

Has anybody been able to allocate more than 4GB of memory in a single process using OpenCL?

There have been discussions about this problem on another thread with a less explicit title that seemed to conclude that NVIDIA will not allow OpenCL users to use more than 4GB of memory.
https://devtalk.nvidia.com/default/topic/498004/cuda-programming-and-performance/opencl-6gb-memory-problem-get-error-message-at-4-2gb-of-memory/2/

I don’t think it’s possible to use >4GB on OpenCL using current NVIDIA drivers.