NVBLAS W/ 4 GPUs Unknown Error

Hi,

I’ve been playing around with NVBLAS, cublasXt, and Magma setups on an Ubuntu 14.04 machine with 4 Tesla K40s and CUDA 7 and have been having some problems.

I’m working with a large company Fortran code as well as a small personal Fortran code that I’ve written and have implementations for CPU, Cublas (imported and host), CublasXt, MAGMA, and NVBLAS for both programs where matrix multiplication is offloaded to the GPU. My personal code runs fine with each implementation and properly uses all GPUs. With the company code, however, I get errors:

Using a cublas code that was built on a different computer, tarred up, and installed on the 4-gpu machine produces correct results (same as CPU). Building the cublas version directly on the 4-gpu machine results in some problems. It runs to completion and in the same amount of time as the imported cublas version, but the resulting data is wrong. This bad data is very similar to the output from my nvblas version:

For NVBLAS, nvidia-smi outputs:

+------------------------------------------------------+                       
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          Off  | 0000:02:00.0     Off |                    0 |
| N/A   32C    P0    61W / 235W |    235MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40m          Off  | 0000:03:00.0     Off |                    0 |
| N/A   30C    P0    61W / 235W |    235MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K40m          Off  | 0000:83:00.0     Off |                    0 |
| N/A   31C    P0    61W / 235W |    235MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K40m          Off  | 0000:84:00.0     Off |                    0 |
| N/A   31C    P0    61W / 235W |    235MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     23853    C   ********************************************    87MiB |
|    0     24484    C   Unknown Error                                   87MiB |
|    1     23853    C   ********************************************    87MiB |
|    1     24484    C   Unknown Error                                   87MiB |
|    2     23853    C   ********************************************    87MiB |
|    2     24484    C   Unknown Error                                   87MiB |
|    3     23853    C   ********************************************    87MiB |
|    3     24484    C   Unknown Error                                   87MiB |
+-----------------------------------------------------------------------------+

where the asterisks are the path to the program. The program runs to completion with the same time as a cpu run of the same problem size but the result data is incorrect.

With my cublasXt implementation, processes will spawn on the GPU(s) - depending on how many I specify to use - but the utilization stays at 0% for the whole run. The program completes in a time faster than a single-GPU run (and much faster than a CPU run) but also has incorrect data.

Finally, my MAGMA version spawns a process on one GPU (so far - I think I can specify to use all of them) but then outputs a TON of errors like the following:

CUBLAS error: memory mapping error (11) in d_magma_mpytrn_ at d_magma_mpytrn.c:21
CUBLAS error: memory mapping error (11) in d_magma_mpytrn_ at d_magma_mpytrn.c:22
CUBLAS error: memory mapping error (11) in d_magma_mpytrn_ at d_magma_mpytrn.c:41
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:44
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:45
 ** On entry to ZGEMM  parameter number 13 had an illegal value
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:64
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:44
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:45
 ** On entry to ZGEMM  parameter number 13 had an illegal value
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:64
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:44
CUBLAS error: memory mapping error (11) in d_magma_mpyacc_ at d_magma_mpyacc.c:45
 ** On entry to ZGEMM  parameter number 13 had an illegal value

The *trn and *acc are the two c files I wrote to use MAGMA (similar implementation for CublasXt).
The program still runs to completion despite these errors in a similar time to the CublasXt version and with similarly bad data (but a different pattern from the nvblas and host cublas data).

Side note: Building the magma version on a different computer resulted in similar errors (but a different illegal parameter and invalid value errors instead of memory mapping errors).
BTW: the memory mapping errors relate to magma_zgetmatrix and magma_zsetmatrix functions.

So, I can’t really give source code and it’s hard to reproduce the problem because my personal code runs fine (which means the GPUs are also not at fault), but if anyone has insight into what could be causing this (specifically the NVBLAS unknown error because I feel like they’re all related) it would help me tremendously.

Thanks!