In a UVA setup, it is not possible for two pointers (belonging to separate allocations), whether CPU or GPU or one of each, belonging to the same (CPU/OS) process, to have the same numerical value.
I agree, it would be nice to get away from the manual memory management of old and stick with the unified approach, it is much simpler. For the time being we are stuck on a Windows platform. Using unified memory, the kernel performance is brutal, specifically due to the initial H->D xfer. With the inability to pre-fetch data prior to executing a kernel, we are seeing unified memory transfer speeds 6x-10x slower than if we use the old style H->D xfer. The behaviour is consistent across a variety of systems and cards (M5000, M6000, GTX 1080, GTX 1080Ti). Right now I don’t see a better way to solve this other than using the old style memory allocation and transfer. If you have suggestions on how to improve unified memory performance, by all means, pass them along.
If you are going for maximum performance, that may well be the correct decision for quite some time to come. This is just the same effect one gets with all kinds of “convenience features” in the computer worlds whether they be caching, virtual memory, branch prediction, etc, etc.
For maximum performance, a programmer can exploit detailed knowledge of control flow and data movement patterns, while an automated mechanism can at best guess intelligently (uses a lot less information). An automated mechanism may work well for 80% of cases, and misbehave spectacularly for a small percentage of cases. I am hoping that the introduction of deep learning techniques can give us sizeable improvements in minimizing the impact of worst-case behavior (e.g. thrashing) from automated mechanism over the next decade.
You’re confusing Unified Memory (UM) with Unified Virtual Addressing (UVA or UA). Please don’t. Please actually read the link I provided. It has approximately nothing to do with Unified Memory.