Hi,
I’ve been playing around with NVBLAS, cublasXt, and Magma setups on an Ubuntu 14.04 machine with 4 Tesla K40s and CUDA 7 and have been having some problems.
I’m working with a large company Fortran code as well as a small personal Fortran code that I’ve written and have implementations for CPU, Cublas (imported and host), CublasXt, MAGMA, and NVBLAS for both programs where matrix multiplication is offloaded to the GPU. My personal code runs fine with each implementation and properly uses all GPUs. With the company code, however, I get errors:
Using a cublas code that was built on a different computer, tarred up, and installed on the 4-gpu machine produces correct results (same as CPU). Building the cublas version directly on the 4-gpu machine results in some problems. It runs to completion and in the same amount of time as the imported cublas version, but the resulting data is wrong. This bad data is very similar to the output from my nvblas version:
For NVBLAS, nvidia-smi outputs:
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:02:00.0 Off | 0 |
| N/A 32C P0 61W / 235W | 235MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:03:00.0 Off | 0 |
| N/A 30C P0 61W / 235W | 235MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40m Off | 0000:83:00.0 Off | 0 |
| N/A 31C P0 61W / 235W | 235MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40m Off | 0000:84:00.0 Off | 0 |
| N/A 31C P0 61W / 235W | 235MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 23853 C ******************************************** 87MiB |
| 0 24484 C Unknown Error 87MiB |
| 1 23853 C ******************************************** 87MiB |
| 1 24484 C Unknown Error 87MiB |
| 2 23853 C ******************************************** 87MiB |
| 2 24484 C Unknown Error 87MiB |
| 3 23853 C ******************************************** 87MiB |
| 3 24484 C Unknown Error 87MiB |
+-----------------------------------------------------------------------------+
where the asterisks are the path to the program. The program runs to completion with the same time as a cpu run of the same problem size but the result data is incorrect.
With my cublasXt implementation, processes will spawn on the GPU(s) - depending on how many I specify to use - but the utilization stays at 0% for the whole run. The program completes in a time faster than a single-GPU run (and much faster than a CPU run) but also has incorrect data.
Finally, my MAGMA version spawns a process on one GPU (so far - I think I can specify to use all of them) but then outputs a TON of errors like the following:
CUBLAS error: memory mapping error (11) in d_magma_mpytrn_ at d_magma_mpytrn.c:21
CUBLAS error: memory mapping error (11) in d_magma_mpytrn_ at d_magma_mpytrn.c:22
CUBLAS error: memory mapping error (11) in d_magma_mpytrn_ at d_magma_mpytrn.c:41
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:44
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:45
** On entry to ZGEMM parameter number 13 had an illegal value
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:64
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:44
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:45
** On entry to ZGEMM parameter number 13 had an illegal value
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:64
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:44
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:45
** On entry to ZGEMM parameter number 13 had an illegal value
The *trn and *acc are the two c files I wrote to use MAGMA (similar implementation for CublasXt).
The program still runs to completion despite these errors in a similar time to the CublasXt version and with similarly bad data (but a different pattern from the nvblas and host cublas data).
Side note: Building the magma version on a different computer resulted in similar errors (but a different illegal parameter and invalid value errors instead of memory mapping errors).
BTW: the memory mapping errors relate to magma_zgetmatrix and magma_zsetmatrix functions.
So, I can’t really give source code and it’s hard to reproduce the problem because my personal code runs fine (which means the GPUs are also not at fault), but if anyone has insight into what could be causing this (specifically the NVBLAS unknown error because I feel like they’re all related) it would help me tremendously.
Thanks!