Windows Command Line Interface and Cuda

Hello,

I’m trying to run all my programs through windows command prompt instead of using visual studios. I’m going through the installation guide and I have a few questions.

I’ve reached the verify the installation section and tested the samples.
I’ve been able to compile and run deviceQuery through visual studio using the deviceQuery.cpp file. I tried recreating this result using the command line prompt but I was not sure what the syntax is supposed to be.

Is there a guide or reference I can use to determine how to run programs for CUDA through cmd?

The installation guide wasn’t not helpful for this purpose. At this time I’ve tried typing, inside the deviceQuery folder:
“nvcc -E deviceQuery.cpp” and met with the following as a result.

nvcc warning: the ‘compuite_20’, ‘sm_20’, and ‘sm_21’ architectures are deprecated, and may be removed in a future realease…
nvcc fatal : cannot find compiler ‘cl.exe’ in PATH

Any assistance will be greatly appreciated.

The installation guide is indeed not helpful for this purpose.

One possible method to figure this out is to study the way the VS projects result in compile commands (study the VS console when you compile a project) you can figure it out with some trial and error.

You may need to add some paths to your environment variables (e.g. %PATH%)

I use the Windows command line exclusively when working with CUDA. For one-time set up of the environment variables needed by MSVC prior to the use of the CUDA toolchain, I created a batch file compilervars.bat with the following content:

call “c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat” amd64

Obviously the location of the vcvarsall.bat file will differ based on your MSVS version. The argument amd64 specifies the 64-bit toolchain. Other compilers may come with similar environment setup batch files. For example I use the following to set up the environment for the Intel compilers:

call “c:\Program Files (x86)\Intel\Composer XE 2013\bin\iclvars.bat” intel64 vs2010
call “c:\Program Files (x86)\Intel\Composer XE 2013\bin\ifortvars.bat” intel64 vs2010

@txbob
I will look into and see what I can come up with. I will post here if I find something that can help someone else.

@njuffa
I’ll try that later today. I use MSVS 15, is there any discrepancies or and issues of using a newer version over VS10? Also, do you have any references I can use to find what commands I can use to run my programs?

Sorry, I don’t understand the question. Once you have a binary executable, simply invoke it by its name from the Windows command line, adding appropriate command line arguments if the application requires them.

As far as command line usage of compilers goes, nvcc is not different from other tool chains. Obviously all compiler have their own set of command line switches. As for the CUDA examples, they have been set up for use with the MSVS IDE, which is what most programmers using Windows use.

Depending on the complexity of the example programs, it may take a bit of work to create makefiles equivalent to the solution files. I don’t normally have a need to build the example programs; I have built maybe two or three of the simpler ones from the command line in the past. Mostly a question of linking all required libraries as I recall.

I don’t have MSVS 15, so I cannot tell you what specific locations it uses, but my experience is that the overall directory structure of MSVS changes little from version to version, so it shouldn’t be too hard to find where the various components are located. Just explore the MSVS directory tree(s) to get some idea where things are.

@njuffa
Basically I’m having trouble running programs through command line, since I don’t really know what commands to use.

For example, if I have a simple HelloWorld program for c++ that works perfectly fine on an IDE, I can’t manage to make it work through commandline. I’ve been typing “gcc helloworld.cpp” and I run into issues.

For a cuda example, I mentioned earlier that I got the deviceQuery to work on an IDE but not on the commandline prompt. I’m not sure what to type into cmd. I’ve read that I have to compile and then run the file, is that what you mean by the makefile? I donwnloaded cmake because I thought that’s what I needed to compile the files, but please clarify if I am mistaken somewhere.

I am not sure how gcc comes into the picture. When using CUDA on Windows, MSVC is the only supported host compiler. It seems to me you would want to familiarize yourself in general terms with the basic command-line operation of compilers and the make utility (I would suggest installing Cygwin so you can use gmake on Windows).

I don’t have a relevant tutorial handy to point you at, as it has been many years since I learned this myself. Back then toolchain and operating system documentation was a good place to learn how to build code. That might still work today, for all I know.

Here is how you can build and run deviceQuery directly from the command line, after first setting up the MSVC environment as described earlier. Note: You would want to set up a proper makefile instead of building straight from the command line. In the makefile you can specify the location of header files and libraries.

C:\Users\All Users\NVIDIA Corporation\CUDA Samples\v8.0_Utilities\deviceQuery>nvcc -o deviceQuery.exe -I ../../common/inc deviceQuery.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : nvcc support for Microsoft Visual Studio 2010 and earlier has been deprecated and is no longer being maintained
deviceQuery.cpp
   Creating library deviceQuery.lib and object deviceQuery.exp

C:\Users\All Users\NVIDIA Corporation\CUDA Samples\v8.0_Utilities\deviceQuery>deviceQuery
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro K2200"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 4096 MBytes (4294967296 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1124 MHz (1.12 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Quadro K2200
Result = PASS