There are a few more that you should add:
-
Full support for .Net (full CUDA driver API access and more) (C# and Visual Basic Examples)
-
Full support for Perl (full CUDA driver API access and more–see below)
-
Full support for Python (full CUDA driver API access and more–see below)
-
Full access for Ruby to run CUDA via the CUDA driver API
-
Full access for Lua to run CUDA via the CUDA driver API
-
Source code for all Kappa library language bindings and keywords are available using the Kappa library installers.
Performance is usually comparable to C++ since this is a high-level interface–most CUDA API operations such as memory management and transfer and other CUDA API operations are performed by the Kappa C++ library. (Performance can be better than any single CUDA C/C++ SDK example since all CUDA best practices, memory mapping plus concurrent kernel execution are the default if supported by the GPU hardware.) Full multi-GPU and CUDA JIT is available for all language bindings.
Since the Kappa library uses a producer/consumer data flow scheduler, defaults to asynchronous CUDA kernel launches, and supports asynchronous CPU kernel and SQL operations, it can achieve full occupancy of CPU and GPU. The CUDA kernel launches are such that, on GF100 GPUs, concurrent kernel execution is automatic and the usual mode. This assumes that the GPU has occupancy available for that mixture of kernels. Whether CUDA kernels can execute concurrently becomes a (potentially nondeterministic) result of the dynamics of execution of host and GPU code that should always meet or exceed performance otherwise available.
For .Net, you can create .Net subclass instances to tie to the Kappa IO keyword and to receive exception notifications. These subclasses execute on the host thread associated to the GPU context so that the full CUDA API is accessible for that GPU context.
For the Perl and Python mentioned above, developers can use a mixture of CUDA C++ running on the GPU, and C++ (including OpenMP), Perl, or Python running on the host as a single integrated processing task.
Additional language bindings (non-tested–no examples) are available for invoking CUDA via the Kappa library from: Java, R, PHP, Octave/Matlab, TCL, allegrocl, chicken, guile, mzscheme, ocaml, and pike.
The Kappa library is commercial but the .Net, Perl, Python, Lua, Ruby, etc modules/packages, examples, and keyword source code are available under the MIT License.