Opinions on OpenCL on nVidia/AMD GPUs Is it worth supporting both vendors so I can always use the be

I may be starting a project soon that will use GPUs to do image processing on a large amount of data. Its going to be a long project, so I was thinking of using OpenCL with the intention being to use which ever manufacturer - nVidia or AMD - has the best GPUs at the time, and I envision that 1 or 2 changes are quite likely through the life of the project. While I’ve been interested in OpenCL/OpenGL/DirectX/Cuda for quite a while, I haven’t had the opportunity to work on a decent project so I don’t have any experience to speak of, hence the questions.

I’d like to get your general opinions on the following points:

[list=1]

[*]Do you find that the effort put into making OpenCL code work on both nVidia and AMD GPUs is worth the performance extracted from the AMD GPUs? I’d imagine this has a lot to do with which architecture better suites your particular problem.

[*]Which vendor has more bugs in their implementation?

[*]Do nVidia and AMD both properly support the basic requirements for OpenCL 1.0/1.1?

Any other opinions are also appreciated, I’m trying to gauge the current state of affairs.

Code you write in OpenCL may not be performance portable across different vendors. Also, if you are planning to use libraries for basic math like NVIDIA cuBLAS and ArrayFire(Free GPU library), CUDA would be better.

I don’t expect to use CUBLAS and for this work I expect I’ll have to code just about every algorithm from scratch.

Yes, imo - > if you got enough time or manpower it is possible to create algorithms which are able to run efficient on different GPUs and even CPUS.

AMD has imo the better Hardware but the worser Software Environment. You can find bugs by both vendors when they release a new SDK or Driver Version.

Yes. AMD is sometimes a step ahead in OpenCL.

Watch out if you need DP Support, there are some restrictions in NV and AMD devices.

Srdja

Thanks Srdja for the feedback.

I have been loving Nvidia’s products for a long time. They have a little better performance and much stronger drivers especially for Linux.
But for OpenCL I have to express my complaints again and again. The performance has fallen behind. See this benchmark resultsLuxMark Results. I can not believe the 7970 hardware has near double power than 580. It must be driver’s issue. Many guys in this forum have reported this problem and waited for the solution. However, there is no any response from Nvidia. It seems they have stopped the development of OpenCL and never care about this forum. See this benchmark for gtx680 The luxmark 2.0 result of gtx680 is ridiculous. Maybe it is fake news. But it is hard to say impossible.

I hope Nvidia can be more positive towards OpenCL. Don’t kill the open standard for the good of CUDA. After all the present of Khronos is from Nvidia.
Otherwise it is not impossible to choose a red card next time.

Thanks for the benchmarks Zhao. AMD cards have always had a significantly higher theoretical GFLOPS count, its nice to see that they are finally able to expose that performance. As for the poor LuxMark results for the GTX680, I’ll wait for the official benchmarks before judging.

Are there any benchmarks that compare a CUDA implementation of some algorithm (don’t care too much, just interested in performance numbers) to an OpenCL implementation on both Geforce and Radeon cards? That would at least show how much performance you loose going from CUDA to OpenCL and also provide a way of comparing CUDA running on an Nvidia card to OpenCL running on an AMD card.

I’ve started trying to answer my question of a CUDA to OpenCL comparison. This blog post from Accelereyes ([url=“http://blog.accelereyes.com/blog/2010/05/10/nvidia-fermi-cuda-and-opencl/”]http://blog.accelereyes.com/blog/2010/05/10/nvidia-fermi-cuda-and-opencl/[/url]) is somewhat dated (May 2010) but shows that at the time OpenCL and CUDA had similar levels of performance, with OpenCL even beating CUDA on large problems.

The abstract of the following IEEE article also seems to suggest that OpenCL and CUDA have similar levels of performance: [url=“A Comprehensive Performance Comparison of CUDA and OpenCL | IEEE Conference Publication | IEEE Xplore”]Error Page - IEEE Xplore. Still need to see if I can download the article though.

I’m not the only one interested in this: [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA. That thread links to a recent Accelereyes blog post comparing CUDA and OpenCL ([url=“http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/”]http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/[/url]). Seems the only big disadvantage I’m going to suffer from if I use OpenCL is a lack of supporting libraries like BLAS.

As for difficulty of writing OpenCL programs, I’ve been playing with JavaCL, which has made it much easier to get started.

These two performance comparisons are a little old. I guess they used the OpenCL 1.0 drivers of Nvidia which have great performance indeed. However, the latest OpenCL 1.1 drivers have a big performance regression. See this post OpenCL 1.1 drivers

Well, I didn’t have to wait for very long: http://www.tomshardware.com/reviews/geforce-gtx-680-review-benchmark,3161.html. That pretty much settles it. Whatever the difficulties of developing for both AMD and Nvidia cards, it definitely looks like it will be worth the effort if this is going to be the performance gap for the next 2 years.

Thanks everyone for the comments.

I am not sure what you mean “That pretty much settles it”. If you read the reviews of GTX680 carefully, you might find it is a really very impressive game card which gives AMD a difficult problem. However it is a very not impressive GPGPU card. The OpenCL benchmark results are even worse than GTX580. The double precious capability is set 1/24. See here So for OpenCL computing it is on the same level with GTX560. Fow now 7970 is the big boss of OpenCL performance. Maybe we can expect the driver optimization and the GK110 in the near future.

If Nvidia wants to force GPGPU developer to buy Quadro or Tesla by cutting the performance of GeForce card, it would be terrible.

To explain what I meant, I started this topic because I was not sure if I should go with CUDA or OpenCL and wanted to get other developers experiences with OpenCL portability.

The GTX 680 is definitely a good gaming card, but that doesn’t mean it will be a good GPU for image processing. If my opinion is wrong, feel free to tell me so. At the price that the GTX 680 is selling for, an extra $50 to get the Radeon HD7970 is not difficult to justify since it is faster than the GTX 680 in OpenCL, especially in double precision maths. I feel that AMD will still be in the lead in OpenCL performance on desktop GPUs for this generation and the next one, based on their past performance potential. So for that reason, OpenCL looks like the best choice as I have a good chance of using AMD GPUs in 2 years time.

As for a GK110 or something better than the GTX 680 GPU, why would Nvidia bother? It has beaten the 7970 in games, which is what the desktop market cares about. As for professional GPUs, I certainly hope they produce a better Tesla card but that will be far more expensive than my project will be able to afford for the foreseeable future, forcing me back to the desktop GPUs.

If someone disagrees with my opinions, please post why you disagree as that will teach me a different point of view. Ultimately, that’s what I wanted to learn from this thread.

My own opinion is with a GeForce GTX 680 having OpenCL GPGPU performance on the range of Radeon HD7950 and HD7870 on the most General-Purpose oriented benchmarks (say LuxMark and Sandra 2012), excluding any sort of peak FMA benchmarks, and Radeon HD7750 delivering strong results on the same benchmarks largely ahead of GTX560, there’s a clear actual winner on the GPGPU war. And OpenCL is the way to go to have access to AMD Radeon GPU and APU.

But that lead me to different conclusions on which brand to buy:

Naturally if you want top-notch OpenCL performance, go AMD with a HD7950 for performance, or HD7750 for performance per penny (and per watt!).
But if you want your code to run as it’s best on any platform, you’d better go nVidia, because at this point, something that will run fast (hum) on new Kepler wil probably run faster on AMD GCN architecture. Due to the GCN architecture being much more advanced and less prone to divergence than Kepler on GK104, GCN optimized-code may run very slowly on Kepler!

So my guess is you’d better buy a GK104 (maybe GTX 670 or GTX 660 when they become available) if you plan to have your code running on all architecture, or Radeon 7950/7970 if you just look for the best performance on your own computer.

Currently I have a GTX 560 Ti at home and a GTX 260 at university and plan on doing my development on those two cards for the moment. Any upgrades will only happen later in the project, next year if I’m lucky. Thanks for an excellent blog post parallelis, its a pity that the available number of registers on the GTX 680 has not increased along with the number of CUDA cores. Thanks also for the information about GCN, its much appreciated.