cublas for 10.1 is missing

As you can see in https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ , pretty much every package have both 10.0 and 10.1 version besides cublas, and this is causing libraries like OpenCV to fail to compile. Can we get a 10.1 version of cublas as well?

2 Likes

CUBLAS packaging changed in CUDA 10.1 to be outside of the toolkit installation path. On the RPM/Deb side of things, this means a departure from the traditional cuda-cublas-X-Y and cuda-cublas-dev-X-Y package names to more standard libcublas10 and libcublas-dev package names.

Installing via the usual meta-packages (cuda, cuda-10-1, cuda-libraries-10-1, etc) should still pull in these packages as a dependency, or you can install just them specifically using yum/dnf/zypper/apt-get.

You can find some more information at [url]https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-new-features[/url].

1 Like

But, I cannot find cublas neither in /usr/lib nor /usr/lib32, after installing cuda.
During apt-get installation, libcublas10 and libcublas-dev seem to be set up, since I saw following messages.

Setting up libcublas10 (10.1.0.105-1) …
Setting up libcublas-dev (10.1.0.105-1) …

On Ubuntu 18.04, I see the libs in /usr/lib/x86_64-linux-gnu (this is the multiarch lib directory) and headers in /usr/include.

1 Like

How to compile current Tensorflow with Cuda 10.1 then?

➜  ➜  tensorflow git:(master) bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
DEBUG: Rule 'build_bazel_rules_swift' modified arguments {"commit": "001736d056d7eae20f1f4da41bc9e6f036857296", "shallow_since": "1547844730 -0800"} and dropped ["tag"]
DEBUG: ~/.cache/bazel/_bazel_user/15086820fc7a6f1383d8c38c62220208/external/build_bazel_rules_apple/apple/repositories.bzl:35:5: 
WARNING: `build_bazel_rules_apple` depends on `bazel_skylib` loaded from https://github.com/bazelbuild/bazel-skylib.git (tag 0.6.0), but we have detected it already loaded into your workspace from None (tag None). You may run into compatibility issues. To silence this warning, pass `ignore_version_differences = True` to `apple_rules_dependencies()`.

ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': in ....../tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "....../third_party/gpus/cuda_configure.bzl", line 1501
                _create_local_cuda_repository(repository_ctx)
        File "....../third_party/gpus/cuda_configure.bzl", line 1266, in _create_local_cuda_repository
                _find_libs(repository_ctx, cuda_config)
        File "....../third_party/gpus/cuda_configure.bzl", line 859, in _find_libs
                _find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
        File "....../third_party/gpus/cuda_configure.bzl", line 773, in _find_cuda_lib
                find_lib(repository_ctx, [("%s/%s%s" % (bas...], ...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 750, in find_lib
                auto_configure_fail(("No library found under: " + ",...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 341, in auto_configure_fail
                fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: No library found under: /usr/local/cuda-10.1/lib64/libcublas.so.10.1, /usr/local/cuda-10.1/lib64/stubs/libcublas.so.10.1, /usr/local/cuda-10.1/lib/powerpc64le-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x86_64-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x64/libcublas.so.10.1, /usr/local/cuda-10.1/lib/libcublas.so.10.1, /usr/local/cuda-10.1/libcublas.so.10.1
WARNING: Target pattern parsing failed.
ERROR: error loading package 'tensorflow/tools/pip_package': in ....../tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "....../third_party/gpus/cuda_configure.bzl", line 1501
                _create_local_cuda_repository(repository_ctx)
        File "....../third_party/gpus/cuda_configure.bzl", line 1266, in _create_local_cuda_repository
                _find_libs(repository_ctx, cuda_config)
        File "....../third_party/gpus/cuda_configure.bzl", line 859, in _find_libs
                _find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
        File "....../third_party/gpus/cuda_configure.bzl", line 773, in _find_cuda_lib
                find_lib(repository_ctx, [("%s/%s%s" % (bas...], ...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 750, in find_lib
                auto_configure_fail(("No library found under: " + ",...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 341, in auto_configure_fail
                fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: No library found under: /usr/local/cuda-10.1/lib64/libcublas.so.10.1, /usr/local/cuda-10.1/lib64/stubs/libcublas.so.10.1, /usr/local/cuda-10.1/lib/powerpc64le-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x86_64-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x64/libcublas.so.10.1, /usr/local/cuda-10.1/lib/libcublas.so.10.1, /usr/local/cuda-10.1/libcublas.so.10.1
INFO: Elapsed time: 10.828s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package
    Fetching @local_config_cuda; fetching

Cheers
Pei

any updates here? I have the same problemm as well

From your output

Cuda Configuration Error: No library found under: /usr/local/cuda-10.1/lib64/libcublas.so.10.1 ...

So tensorflow is looking for libcublas in /usr/local/cuda-10.1 but it appears that since cuda-10.1 libcublas has been moved outside of /usr/local/cuda and results in a lot of confusions and broken systems. See threads here (so I have no idea why Nvidia did that)

From AndyDick’s post it is now in /usr/lib/x86_64-linux-gnu

So maybe just make a symlink

sudo ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10.1 /usr/local/cuda-10.1/lib64/libcublas.so.10.1
1 Like

I’m also trying to compile TF with CUDA 10.1. After creating that symlink, I got this error :

Cuda Configuration Error: None of the libraries match their SONAME: /usr/local/cuda-10.1/lib64/libcublas.so.10.1

TF apparently doesn’t support cuda-10.1 (so either wait for TF12 or back to cuda-10.0 or install a local instance of cuda-10.0). Though your error is kind of intriguing. Check your file system to see if you actually created the symlink and if you did whether it is broken (in terminal cd /usr/local/cuda-10.1/lib64 and then ls, if link is broken it would be red)

I have installed cuda-10.1 with the .run file in my $HOME just to check it out. cuda-10.1 has made some changes to the places where libs are installed and the way they are named thus making a mess of people’s systems left and right.

First libcublas files by default are no longer in cuda_root/lib64 but have been moved to /usr/lib/x86_64-linux-gnu by default (if installed system wide), so to install them back to cuda_root I had to override the default in the .run file’s installation.

Secondly libcublas.so.10.1 is missing, the installer created only these: libcublas.so
libcublas.so.10, libcublas.so.10.1, libcublas.so.10.1.0.105

So you have to symlink it to libcublas.so.10.1.0.105 (but there is a symlink called libcublas.10.0 for cuda10.0)

But this happens not just with cublas but with all other libs as well, tensoflow looks for libcu***.so.10.1 but all these are missing. So it would be a pain to make symlinks for all these files.

I don’t know what the file structure is if you install with the .deb system wide (since I am sticking to cuda-10.0 system wide, seeing all the problems reported here) but it seems that the deb installation is suffering from the same problems,–that files are being moved and named differently. Therefore a lot of threads complaining about breakage on updating from 10.0 to 10.1.

It is really annoying that Nvidia deliberately breaks backward compatibility for no reason (post #2 by AndyDick says as much), what does it achieved ?

P.S. also cuda-drver on linux installed through Nvidia’s cuda repo since version 10.0 has removed 32 bit support but it is there if you install the driver via the .run file or the .deb in Ubuntu’s repository (driver ppa), thereby breaking steam among other things.
[url]https://devtalk.nvidia.com/default/topic/1045595/cuda-setup-and-installation/nvidia-cuda-10-ubuntu-installer-missing-32-bit-support/[/url]

Moreover, for 32 bit support the documentation
points out to a need to export path as:

export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib\

No, that you have to do for either 64 or 32 bit OS for system to find cuda

I am talking about lib32 support for 64 bit system and it has nothing to do with where your cuda toolkit is, it has to do with the driver installed from Nvidia’s cuda-driver package.

Same problem. I installed cuda 10.1 for the runfile but libcublas.10.1 was not installed:

sudo find /usr/local/cuda/ -name ‘blas
/usr/local/cuda/targets/x86_64-linux/include/cublas.h
/usr/local/cuda/targets/x86_64-linux/include/cublas_v2.h
/usr/local/cuda/targets/x86_64-linux/include/cublas_api.h
/usr/local/cuda/targets/x86_64-linux/include/nvblas.h
/usr/local/cuda/targets/x86_64-linux/include/cublasXt.h
/usr/local/cuda/targets/x86_64-linux/include/cublasLt.h
/usr/local/cuda/doc/html/nvblas
/usr/local/cuda/doc/html/cublas
/usr/local/cuda/doc/html/cublas/graphics/cublasmg_gemm.jpg
/usr/local/cuda/doc/man/man7/libcublas.7
/usr/local/cuda/doc/man/man7/libcublas.so.7

Also ran into issue when building tensorflow

1 Like

The second posting in this thread (above) explains what has happened.

On my system, using a CUDA 10.1 runfile install, the libcublas libraries are in /usr/lib64

try

sudo find /usr -name libcublas*

2 Likes

Thanks! The symlink fixed the problem.

Can somebody please clarify what is happening when I attempt to install the latest 10.1 cuda toolkit?

It looks like, installing the cuda toolkit and not the nvidia driver, installs files in both /path/to/toolkit and /usr/lib64. On top of that some of the files are 10.2 (see cublas) and some of the files are labelled with three numbers (which tensorflow <= 1.13 does not like).

I use the binary file cuda_10.1.168_418.67_linux.run

When I install cuda 10.1 I do not get any libcublas.10.1 files, I checked both the toolkit root (/usr/local/cuda-10.1), and /usr/lib64.

In /usr/lib64 I have other cuda files for example, libcublas.so.10.2.0.168.

I don’t understand why cublas gets installed to a system location, /usr/lib64. Previously I could set the install location from the binary runfile. Which I set to be /usr/local/cuda-X.Y versions. Then all of the cuda runtime was there, and I can change them.

As a work around, I downloaded the run file, I extracted it using

./cuda_10.1.168_418.67_linux.run --extact=extracted

Then in the extracted folder there is a cuda-toolkit. I copied that to /usr/local/cuda-10.1 After that, I needed to create symbolic links for all of the .so files that didn’t have a .so.10.1 version (including the blas files that are 10.2.0.168 and 10.1.168).

That is probably tensorflow specific, as I could compile r1.14 using the new cuda drivers setup, but r1.13 (current stable version) needed the 10.1 sym links.

Thank you

1 Like

Same exact problem.

sudo find /usr -name "libcublas*"
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcublas.so.10.1
/usr/local/cuda-10.1/doc/man/man7/libcublas.7
/usr/local/cuda-10.1/doc/man/man7/libcublas.so.7
/usr/share/lintian/overrides/libcublas9.1
/usr/share/doc/libcublas9.1
/usr/share/doc/libcublas10
/usr/share/man/man7/libcublas.so.7.gz
/usr/share/man/man7/libcublas.7
/usr/share/man/man7/libcublas.7.gz
/usr/share/man/man7/libcublas.so.7
/usr/lib/x86_64-linux-gnu/libcublas.so.9.1.85
/usr/lib/x86_64-linux-gnu/libcublasLt.so.10
/usr/lib/x86_64-linux-gnu/libcublasLt_static.a
/usr/lib/x86_64-linux-gnu/libcublasLt.so.10.2.0.168
/usr/lib/x86_64-linux-gnu/libcublas_device.a
/usr/lib/x86_64-linux-gnu/libcublas.so.10.2.0.168
/usr/lib/x86_64-linux-gnu/libcublas.so
/usr/lib/x86_64-linux-gnu/libcublasLt.so
/usr/lib/x86_64-linux-gnu/libcublas_static.a
/usr/lib/x86_64-linux-gnu/libcublas.so.9.1
/usr/lib/x86_64-linux-gnu/stubs/libcublas.so
/usr/lib/x86_64-linux-gnu/stubs/libcublasLt.so
/usr/lib/x86_64-linux-gnu/libcublas.so.10

I’m getting the same missing cublas error despite following tensorflow’s instructions exactly.

Finally managed to solve this thanks to matthew.smith3’s comment.

I personally didn’t have any of the cublas files in /usr/lib/x86_64-linux-gnu/. So after extracting the .run file I just dumped the contents of the cuda-toolkit folder into /usr/local/cuda-10.1/lib64.

Then I created a symbolic link for each of the cublas files (including cublasLt) to /usr/lib/x86_64-linux-gnu/.

sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so /usr/lib/x86_64-linux-gnu/libcublas.so
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1 /usr/lib/x86_64-linux-gnu/libcublas.so.10.1
etc.

4 Likes

Had the same issue, and I wouldn’t have figured it out without @phillip3m’s and @matthew.smith3’s replies.

While trying to install/update CUDA 10.1 via the previously-linked Tensorflow instructions page today I received the all-too-familiar Could not load dynamic library 'libcublas.so.10' error. Upon checking all of the lib directories it was clear that none of the cublas libraries were installed at all. This was especially evident when, upon downloading and extracting the runfile as above, the libcublas.so files were in cuda-toolkit/lib64 but not in the newly-installed /usr/local/cuda-10.1/lib64 directory or any of its aforementioned links.

I was able to super-user copy the cublas files from the extracted runfile directly into /usr/local/cuda-10.1/targets/x86_64-linux/lib and now Tensorflow is working appropriately. I did not have to create symlinks in /usr/lib/x86_64-linux-gnu, and it appears that only old (v7) CUDA libraries are present there. I assume this is because of Tensorflow updates, and cannot comment on the efficacy of this solution for other packages (PyTorch, OpenCV, etc).

I’m going to open an issue with Tensorflow, as they should obviously have working installation instructions, but this clearly needs to be addressed by Nvidia as well, since it appears the main issue is that apt-get install --no-install-recommends cuda-10-1 does not install cublas at all anywhere on the machine.

4 Likes

Life saver!!