MATLAB not seeing nvcc

Hi, I’m just trying to run my first CUDA code in MATLAB it’s not working. I get the following result:

system(‘nvcc’)
/bin/bash: nvcc: command not found

ans =

127

I followed the CUDA installation instructions and added the 2 PATH lines to my .bash_profile, but still MATLAB doesn’t seem to find it. I read somewhere that Mavericks uses launchd.conf for such things? Any help would be much appreciated.
I’m using MATLAB 2013b, OS X 10.9 and CUDA 6.0.

what happens if you type:

nvcc --version

from a command line?

In order to invoke the compiler from the MATLAB prompt, you need to load the proper bash variables issuing the command:
setenv(‘BASH_ENV’,‘~/.bash_profile’);

Thanks for the replies. I used the setenv command in MATLAB and this is what I get now.

setenv(‘BASH_ENV’,‘~/.bash_profile’);
system(‘nvcc’)
nvcc warning : The ‘compute_10’ and ‘sm_10’ architectures are deprecated, and may be removed in a future release.
nvcc fatal : No input files specified; use option --help for more information
nvcc: Signal 127

ans =

255

So that’s all good, but how do I make this a permanent change to MATLAB’s startup routine? Sorry for the newbie questions.

Type:
edit startup

It will open or create the file startup.m. Add the setenv line to the file, next time you restart Matlab the command
will be automatically executed.

Thanks, works perfectly!

setenv(‘BASH_ENV’,‘~/.bash_profile’);
system(‘nvcc’)
nvcc : fatal error : No input files specified; use option --help for more information

ans =

-1

system(‘nvcc -c AddVectors.cu’)
nvcc : warning : The ‘compute_10’ and ‘sm_10’ architectures are deprecated, and may be removed in a future release.
nvcc : fatal error : Cannot find compiler ‘cl.exe’ in PATH

ans =

-1

nvcc --version
Undefined function ‘nvcc’ for input arguments of type ‘char’.
HELP!!!

My environment variables are shown below:
Cuda path:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0
V.S.10.0 path:D:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE
V. C path:d:\Program Files (x86)\Microsoft Visual Studio 10.0

Any help for the beforehand fault? Have I installed nvcc successfully? and How to compile a cu file?

It looks like your nvcc (CUDA toolkit) is installed properly. Try the following:

system(‘nvcc --version’)

It looks like your nvcc (CUDA toolkit) is installed properly. Try the following:

system(‘nvcc --version’)

If you are in matlab, something like:

nvcc --version
is not the correct way to call an outside program from matlab… you need to escape to the shell, by doing either:
system(nvcc --version)
or
!nvcc -version

like txbob mentioned. Unlike the original poster, you are in Windows, not Linux, so nvcc tries to look for cl.exe, which is the visual studio compiler executable. Like the error says, you need to add it to your path. I believe the correct location for you would be:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin

Add the above to your PATH variable and restart matlab, and you should be all set.

Finally, there is no need for you to issue the
setenv(‘BASH_ENV’,‘~/.bash_profile’) in Windows… that’s only if you’re using a Linux operating system with bash shell, where the paths happen to be in the bash_profile file.

Thank u all! After adding the Visual C path(le.exe file) to the path in Environment variables, the problem was solved.An example of adding two vectors was used to testify the result separately on GPU and on CPU.
But there are some warnings shown below during file compiling:

system(‘nvcc -c AddVectors.cu’)
nvcc : warning : The ‘compute_10’ and ‘sm_10’ architectures are deprecated, and may be removed in a future release.
c:\program files\nvidia gpu computing toolkit\cuda\v6.0\include\math_functions.h : warning C4819: 该文件包含不能在当前代码页(936)中表示的字符。请将该文件保存为 Unicode 格式以防止数据丢失
c:\program files\nvidia gpu computing toolkit\cuda\v6.0\include\device_functions.h(783) : warning C4819: 该文件包含不能在当前代码页(936)中表示的字符。请将该文件保存为 Unicode 格式以防止数据丢失

How can I remove these warnings?

The warning about compute_10 and sm_10 can be eliminated by specifying a different target architecture, that is compatible with the GPU you are using. Such as:

system(‘nvcc -c -arch=sm_11 AddVectors.cu’)

You need to specify the architecture(s) you want to compile against (for the first warning). So for example:

system(‘nvcc -c -gencode arch=compute_13,code=sm_13 AddVectors.cu’)

You can use multiple gencode arguments to compile for other architectures, i.e.:

system(‘nvcc -c -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 AddVectors.cu’)

For the other 2 warnings:

Thanks!All problem has been removed successfully!

system(‘nvcc -c -gencode arch=compute_13,code=sm_13 AddVectors.cu’)
AddVectors.cu

ans =

 0

disp(‘nvcc compiling done !’);
disp(‘2. C/C++ compiling for AddVectors.cpp with AddVectors.obj…’);
mex AddVectors.cpp AddVectors.obj -lcudart -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\lib\x64"
nvcc compiling done !

  1. C/C++ compiling for AddVectors.cpp with AddVectors.obj…

disp(‘C/C++ compiling done !’);
disp(‘3. Test AddVectors()…’)
disp(‘Two input arrays:’)
A=single([1234567891000000])
B=single([1098765432100000])
disp(‘Result:’)
tic
C=AddVectors(A, B)
toc
C/C++ compiling done !

  1. Test AddVectors()…
    Two input arrays:

A =

1.2346e+15

B =

1.0988e+15

Result:

C =

1.6435e-28

Elapsed time is 0.804580 seconds.

tic
C=A+B
toc

C =

2.3333e+15

Elapsed time is 0.025995 seconds.

You do realize that the CUDA results are incorrect, yes? ;) Further, your test case makes no sense, you should be trying to add thousands, even millions of elements to get any time benefit, this is why the CUDA time is so high compared to the matlab one. (GPU overhead of initialization, memory copies, launching kernel, etc… although in this case it’s probably dominated by initialization)

I also suspect your mex file is incorrectly handling the inputs you’re feeding it… because 1.2346e+15 + 1.0988e+15 is certainly not 1.6435e-28 :)

A=single([1234567891000000])

A =

1.2346e+15

B=single([1098765432100000])

B =

1.0988e+15

C=AddVectors(A, B)

C =

1.3560e+13

A=single([1234567891000000])

A =

1.2346e+15

B=single([1098765432100000])

B =

1.0988e+15

C=AddVectors(A, B)

C =

1.3560e+13

Why are the result of every run time different? and how to debug and remove the err?

disp(‘1. nvcc AddVectors.cu compiling…’);
% system(‘nvcc -ccbin “D:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64”’)
system(‘nvcc -c -gencode arch=compute_13,code=sm_13 AddVectors.cu’)

  1. nvcc AddVectors.cu compiling…
    AddVectors.cu

ans =

 0

disp(‘nvcc compiling done !’);
disp(‘2. C/C++ compiling for AddVectors.cpp with AddVectors.obj…’);
mex AddVectors.cpp AddVectors.obj -lcudart -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\lib\x64"
disp(‘C/C++ compiling done !’);
nvcc compiling done !

  1. C/C++ compiling for AddVectors.cpp with AddVectors.obj…
    C/C++ compiling done !

disp(‘3. Test AddVectors()…’)
disp(‘Two input arrays:’)
A=single([1 2 3 4 5 6 7 8 9 10])
B=single([10 9 8 7 6 5 4 3 2 1])
disp(‘Result:’)
tic
C=AddVectors(A, B)
toc

  1. Test AddVectors()…
    Two input arrays:

A =

 1     2     3     4     5     6     7     8     9    10

B =

10     9     8     7     6     5     4     3     2     1

Result:

C =

1.0e+13 *

Columns 1 through 9

1.4660         0         0         0         0         0         0         0         0

Column 10

     0

Elapsed time is 0.268946 seconds.

Any help for correcting this fault? or any example codes or books to recomanded?

runAddVectors

  1. nvcc AddVectors.cu compiling…
    AddVectors.cu

ans =

 0

nvcc compiling done !
2. C/C++ compiling for AddVectors.cpp with AddVectors.obj…
C/C++ compiling done !
3. Test AddVectors()…
Two input arrays:

A =

 1     2     3     4     5     6     7     8     9    10

B =

10     9     8     7     6     5     4     3     2     1

Result:

C =

1.0e+13 *

Columns 1 through 9

1.4660    1.4660    1.4660    1.4660    1.4660    1.4660    1.4660    1.4660    1.4660

Column 10

1.4660

Elapsed time is 0.008161 seconds.

I’d suggest the CUDA by Example book, they have various add vector examples explained.

In a nutshell, because of initialization on first kernel run… which could be for a few reasons:

Either this:
https://devtalk.nvidia.com/default/topic/370286/cuda-programming-and-performance/kernel-size-and-caching/

Or it could also be related to this if you’re compiling for an architecture that’s not the native one on your card:
http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-understand-fat-binaries-jit-caching/

Some variation in runtime is normal in between successive/same kernel runs.