Mars: MapReduce Framework & Cuda ToolKit 7.0

Hey guys, i’m trying to run some samples from the Mars library.

http://www.cse.ust.hk/gpuqp/Mars.html

Particularly the Wordcount example. In the readme it tells me the execute make.sh (which I do) and the associated Makefile spits out:

make: *** No rule to make target `../../common/common.mk'.  Stop.

From what i’m gathering …/…/ is not the correct starting directory for the cuda sdk folder. Apparently the folder is supposed to be under ~/NVIDIA_GPU_Computing_SDK but I have no such folder.

When I installed CUDA I followed the Linux Getting Started guide for Cuda Toolkit 7.0 (Specifically section 3.6 for Ubuntu). Then completely the post install tasks, and verified that ./deviceQuery compiled and ran. (Which it did with no issues).

So my question to all of you is: Where is the SDK folder?? Have they moved it since V7.0?

Any insight would be GRREATLY appreciated!

Thanks,
M

During the linux install, you are prompted to provide a folder path to install the samples. If you accepted the default, and you were installing the toolkit as root, then the samples would be installed in:

/root/NVIDIA_CUDA-7.0_Samples/…

Regardless of what you enter (unless you decline to install the samples) the samples should also be installed in:

/usr/local/cuda/samples

unless you selected something different from the default for your CUDA install path.

Unfortunately I don’t think any of the above will help you much.

I don’t know exactly how the make system for Mars works, but there is no “common.mk” in any of the current CUDA 7 samples folders. NVIDIA_GPU_Computing_SDK refers to a path that corresponded to the old SDK system, prior to CUDA 5.0. That system is no longer used or included in any current CUDA download.

You can still download one of these old SDKs and install it if you wish, from the CUDA Legacy page:

[url]https://developer.nvidia.com/cuda-toolkit-archive[/url]

You would need to choose a CUDA version of 4.2 or prior, and download the SDK for linux. Install it into a directory of your choosing, specifying the path to your current CUDA install. At that point it may be possible just to launch the make there in the top level directory, and it may just work. I used to be able to do this a while ago in the CUDA 5.x era, but haven’t tried it recently.

Hey there txbob, thanks a lot for your reply! I ended up installing toolkit 3.2 and it’s associated SDK last night. then a changes the path in the Makefile to reflect the directory of the SDK. And changed the enviornment variables to match that of version 3.2. I thought I was smooth sailing now, but sure enough, it was not a go. Now i’m getting this error:

/usr/include/c++/4.8/cstdlib(178): error: identifier "__int128" is undefined

/usr/include/c++/4.8/cstdlib(179): error: identifier "__int128" is undefined

Do you know why this would be happening?? How could some identifier in some G++ proprietary header file be undefined??

Thanks for your help!
M

I don’t think you really followed my instructions. I didn’t suggest installing an old toolkit. Just the SDK. And the make that I was referring to was the Makefile at the top level of the SDK.

If you installed the CUDA 3.2 toolkit, I’m not surprised it doesn’t work on a recent distro, and I can’t sort out the reason for you here. I don’t really even know what file you are compiling.

If you really want to use an old toolkit, I suggest using a linux distro that was compatible with that toolkit version. For example RHEL 5.5 should work with CUDA toolkit 3.2

My suggestion was to use whatever toolkit you were using (presumably something recent, like 7.0) and just download the SDK. Then you can try building just the SDK using the newer CUDA 7 toolchain. It may work. If it does, you have the SDK “built” and the directory structure and any needed dependencies should now be in place for your Mars makefile.

Ohh okay…

is this what you’re saying I should do:

  1. I changed Environment Variables back to Cuda 7 paths
  2. I removed the SDK, and then reinstalled and told it that my cuda path was /usr/local/cuda-7.0

After doing this i’m now getting the error:

nvcc fatal : unsupported gpu architecture 'compute_10'

Maybe I botched up your instructions again?

Thanks so much
M

You’re on the right track. compute_10/sm_10 is no longer supported in cuda 7. For me, the easiest thing to do would be just to go through the makefile and remove references to that. But if you’re unfamiliar with what a nvcc compile command should look like (or makefiles in general), this might be difficult. For cuda 7, you would have to remove all references to build targets that were any compute_1x or sm_1x version.

Another alternative would be to back up to a CUDA version that did support cc1.0 (compute_10). The last version that did would be CUDA 6.0 (If you use CUDA 6 you’ll get some deprecation messages during the make/build, but these can be safely ignored.)

Aha! I see, we’re getting closer!

I went into the common.mk file and commented out the only line referencing compute_10

# Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template
#GENCODE_SM10 := -gencode=arch=compute_10,code=\"sm_10,compute_10\"
GENCODE_SM20 := -gencode=arch=compute_20,code=\"sm_20,compute_20\"

and reran the make.sh from the WordCount project. It spit out a bunch of warnings about:

MarsLib.cu(448): warning: conversion from a string literal to "char *" is deprecated

but I’m assuming thats nothing to worry about at this moment.

at the end it exits with an issue saying:

/usr/bin/ld: cannot open output file ../../bin/linux/release/WordCount/WordCount: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [../../bin/linux/release/WordCount/WordCount] Error 1

I looked around in the projects Makefile and the make.sh, but I don’t see any reference to …/…/bin/linux/release/

in the Makefile there is the line:

# Add source files here
EXECUTABLE	:= WordCount/WordCount

Any ideas why it’s trying to output the binary in …/…/bin/linux/release/?

Thanks again,
M

Oops, just found the folder its referrencing. Its in the top of the Mars folder Mars/bin/linux/release but there was no WordCount folder in there, nor a WordCount file. Is ld looking .o files? because in the Mars/sample_apps/WordCount/obj folder there are a list of .o files that the compiler has spit out.

I tried creating a WordCount folder in Mars/bin/linux/release folder and then copy/pasting the .o files into it, but the compiler didn’t like that either:

/usr/bin/ld: cannot find -lcutil_x86_64
/usr/bin/ld: cannot find -lshrutil_x86_64
collect2: error: ld returned 1 exit status
make: *** [../../bin/linux/release/WordCount/WordCount] Error 1

so maybe that’s not what I was supposed to do.

The first problem (cannot open output file) was due to not having the desired folder or not having permissions to access the folder. The makefile should have been constructed to create the output folder if it was needed.

You shouldn’t need to copy/paste any files into it after creating.

the remaining issues:

/usr/bin/ld: cannot find -lcutil_x86_64
/usr/bin/ld: cannot find -lshrutil_x86_64

are because either:

  1. You have not run the makefile associated with the SDK (NOT the Mars makefile) like I instructed
  2. The mars makefile can’t find the referenced SDK folders that contain those libraries.
  3. The libraries were built but have some other name besides the correct ones of libcutil_x86_64.so and libshrutil_x86_64.so

Note that when you run the makefile for the SDK, it may be necessary to do:

make x86_64=1

instead of just

make

I think the missing library references are for helper libraries that shipped with CUDA example codes long long ago. These libraries were not part of CUDA proper, but only for the purpose of simplifying the example codes. CUDA programmers were warned repeatedly in these forums not to use these in their own projects. The utility libraries disappeared somewhere around CUDA 4.0, if I recall correctly.

It seems what you would want to do is to contact the creators of the Mars project to update their code base, or if it is an open source project, request or submit fixes. If this is an orphaned open source project, I would suggest finding a better one that is in active development. Or grab the core bits and pieces of the code that look useful and integrate them into your own build environment (while being mindful of licensing issue).

This is easy to fix. String literals are of type “const char *”, rather than “char *”. The compiler is complaining about the implicit casting away of “const”. Make the left-hand side of the assignment “const char *”.

txbob: Are you referring to the Makefile located in ~/NVIDIA_GPU_Computing_SDK/C/Makefile? there’s also one in C/common/Makefile but i assumed that the Makefile in C would run the one in common.

njuffa: Thanks for the input, i’ll give that a look see.

thanks,
M

when I run the Makefile in the C/ folder I get these errors:

make[1]: Entering directory `/home/taylor/NVIDIA_GPU_Computing_SDK/C/src/scalarProd'
/usr/bin/ld: cannot open output file ../../bin/linux/release/scalarProd: Permission denied
collect2: error: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/scalarProd] Error 1
make[1]: Leaving directory `/home/taylor/NVIDIA_GPU_Computing_SDK/C/src/scalarProd'
make: *** [src/scalarProd/Makefile.ph_build] Error 2

So I figured i’d run it with sudo

and I get this:

make[1]: Entering directory `/home/taylor/NVIDIA_GPU_Computing_SDK/C/src/bilateralFilter'
../../lib/librendercheckgl_x86_64.a(rendercheck_gl.cpp.o): In function `CheckBackBuffer::checkStatus(char const*, int, bool)':
rendercheck_gl.cpp:(.text+0xe0b): undefined reference to `gluErrorString'
collect2: error: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/bilateralFilter] Error 1
make[1]: Leaving directory `/home/taylor/NVIDIA_GPU_Computing_SDK/C/src/bilateralFilter'
make: *** [src/bilateralFilter/Makefile.ph_build] Error 2

run it with sudo, and do:

sudo make -k x86_64=1

the -k switch will tell make to keep on going building targets even if there are errors building previous targets. This is OK because all we really want to do is get the libraries built. The graphical SDK targets that will fail to build due to missing graphical libraries like glut don’t matter.

Hmm… Well it finishes, but still no lcutil_x84_64 or lshrutil_x86_64 located in ~/NVIDIA_GPU_Computing_SDK/C/common/lib which i assume is where they’re supposed to be?

Also its always ending on an ld error involving src/randomFog

make[1]: Entering directory `/home/taylor/NVIDIA_GPU_Computing_SDK/C/src/randomFog'
../../../shared//lib/libshrutil_x86_64.a(rendercheckGL.cpp.o): In function `CheckBackBuffer::checkStatus(char const*, int, bool)':
rendercheckGL.cpp:(.text+0xb6b): undefined reference to `gluErrorString'
collect2: error: ld returned 1 exit status
make[1]: *** [../../bin/linux/release/randomFog] Error 1
make[1]: Leaving directory `/home/taylor/NVIDIA_GPU_Computing_SDK/C/src/randomFog'
make: *** [src/randomFog/Makefile.ph_build] Error 2
make: Target `all' not remade because of errors.

Do you think this is actually the end of the make? or is -k not functioning as expected?

thanks,
M

P.s: just to clarify I tried both sudo make -k x86_64=1 and sudo make -k

Okay so I found a file called cutil.cpp located in the SDK/C/common/src folder. I tried running g++ cutil.cpp on the file to see if i could get it to compile, and it throws lots of errors about header files it can’t find. The first one being cutil.h.

But it also complains about:

builtin_types.h
cmd_arg_reader.h
error_checker.h
stopwatch.h
bank_checker.h

I found and appended the paths to all these files to the #include statements inside cutil.cpp. But then it complained about headers that it couldn’t find inside these header files.

Could this be the problem?

also, I found a
libcutil_x86_64.a located in SDK/C/lib folder
and
libshrutil_x86_64.a located in SDK/shared/lib folder

could I possible do something with these?

Yes, the libcutil_x86_64.a and libshrutil_x86_64.a are the libraries that you want. (So they did get built.) I forgot that the SDK builds these as static libraries (.a) rather than dynamic (.so)

With those libraries in the linker search path, you should not see these messages:

/usr/bin/ld: cannot find -lcutil_x86_64
/usr/bin/ld: cannot find -lshrutil_x86_64

So I assume that by linker search path, your referring to LD_LIBRARY_PATH?

in which case I did the following:

export LD_LIBRARY_PATH=~/NVIDIA_GPU_Computing_SDK/shared/lib:~/NVIDIA_GPU_Computing_SDK/C/lib:/usr/local/cuda-7.0/lib64

when I echo $LD_LIBRARY_PATH I get:

echo $LD_LIBRARY_PATH
/home/taylor/NVIDIA_GPU_Computing_SDK/shared/lib:/home/taylor/NVIDIA_GPU_Computing_SDK/C/lib:/usr/local/cuda-7.0/lib64

but when I sh make.sh in the WordCount folder I still get the errors:

/usr/bin/ld: cannot find -lcutil_x86_64
/usr/bin/ld: cannot find -lshrutil_x86_64

Am I doing something wrong?

Hmmm. Quite a struggle. Hope it’s worth it.

LD_LIBRARY_PATH is not the linker search path. That is the path that the Linux OS uses to search for dynamic libraries to load as reqiured by any applications you launch. That’s not relevant for compiling (and also because the libraries in question are not dynamic .so libraries.)

The linker search path includes whatever you specify after a -L switch.

So if you want the linker to look for libraries in the directory:

/foo/bar

then add a switch like this to the link command:

-L/foo/bar

before the library that you want it to find there.

So if the linker error says that it cannot find -lcutil_x86_64, and libcutil_x86_64.a is in directory /foo/bar, then you should modify the link command to add this -L switch before the -l switch that specifies the library:

-L/foo/bar -lcutil_x86_64

Obviously this may mean modification of whatever makefile you are using.

This description here is all just basic linux stuff. None of it is specific to CUDA or nvcc. You have to follow the same pattern when using gcc/g++. So I don’t know if your linker command is actually using nvcc or gcc or g++ to do the link step when these errors get reported, but it does not matter. You have to modify the link command in an identical fashion.

Yeah, I apologize for that. I’m not the most comfortable with linux. I’ve used it in the past, off and on, but it’s been about 5 years since I’ve actually had do anything with it. I originally was trying to run Mars under VS2013 on the Windows side, but I quickly realized the project was meant to be run under linux.

I really appreciate your help, and it will be worth it. To me atleast…

I see what your saying, so I need to be finding where they run gcc and/or g++ and/or nvcc in the make files and add -L/~/NVIDIA_GPU_Computing_SDK/shared/lib -lcutil_x86_64

I assume the majority of these issues is stemming from the project being several years outdated?

NVIDIA produced an SDK to help with understanding of CUDA. It was supposed to be a bunch of “samples” that would demonstrated various concepts. But it looked somewhat like a real SDK framework with a build system, ancillary libraries, etc. and some developers assumed it would be properly maintained and supported, so they built their own applications within the SDK structure - employing the directory structure, the helper utilities and libraries, and the build system. NVIDIA didn’t intend to provide the level of support needed to maintain this as a production build environment, so the SDK was yanked from the CUDA distribution and replace with the perhaps more aptly names “samples”.

You’re not alone in this. Many others have walked this path, unfortunately. I would say the issue stems largely from a too-tight integration with the SDK framework (expecting peculiar libraries to exist and be in a particular place in a directory structure.)

Yes, find where these libraries are linked in the makefile, and add/modify the preceding -L switches to reflect the path to where they actually exist on your machine.