340.106 nvidia-uvm.ko fails to build under kernel 4.14.y

Hi,
I’m unable to build the 340.106 nvidia-uvm.ko kernel module with kernel 4.14.13. The patch for 340.104 fails. Is there a way to modify the patch for 340.106?
Slackware-current 64-bit (gcc 7.3.0)

--- kernel/uvm/nvidia_uvm_lite.c
+++ kernel/uvm/nvidia_uvm_lite.c
@@ -818,8 +818,15 @@ done:
 }

 #if defined(NV_VM_OPERATIONS_STRUCT_HAS_FAULT)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 11, 0)
 int _fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+#else
+int _fault(struct vm_fault *vmf)
+#endif
 {
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
+    struct vm_area_struct *vma = vmf->vma;
+#endif
 #if defined(NV_VM_FAULT_HAS_ADDRESS)
     unsigned long vaddr = vmf->address;
 #else
@@ -866,7 +873,11 @@ static struct vm_operations_struct uvmlite_vma_ops =
 // it's dealing with anonymous mapping (see handle_pte_fault).
 //
 #if defined(NV_VM_OPERATIONS_STRUCT_HAS_FAULT)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 11, 0)
 int _sigbus_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+#else
+int _sigbus_fault(struct vm_fault *vmf)
+#endif
 {
     vmf->page = NULL;
     return VM_FAULT_SIGBUS;
--- kernel/nv-drm.c	2017-09-21 12:58:23.901972670 +0200
+++ kernel/nv-drm.c	2017-09-21 13:07:32.418269409 +0200
@@ -173,7 +173,7 @@
 {
     int ret = 0;
 #if defined(NV_DRM_AVAILABLE)
-    ret = drm_pci_init(&nv_drm_driver, pci_driver);
+    ret = drm_legacy_pci_init(&nv_drm_driver, pci_driver);
 #endif
     return ret;
 }
@@ -183,7 +183,7 @@
 )
 {
 #if defined(NV_DRM_AVAILABLE)
-    drm_pci_exit(&nv_drm_driver, pci_driver);
+    drm_legacy_pci_exit(&nv_drm_driver, pci_driver);
 #endif
 }

I’ve solved it by using the following patch:

--- kernel/uvm/nvidia_uvm_lite.c	2017-09-27 13:50:46.334075042 +0200
+++ kernel/uvm/nvidia_uvm_lite.c	2017-09-27 13:56:06.358041280 +0200
@@ -818,7 +818,11 @@
 }
 
 #if defined(NV_VM_OPERATIONS_STRUCT_HAS_FAULT)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 11, 0)
 int _fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+#else 	
+int _fault(struct vm_fault *vmf) 	
+#endif
 {
 #if defined(NV_VM_FAULT_HAS_ADDRESS)
     unsigned long vaddr = vmf->address;
@@ -828,7 +832,11 @@
     struct page *page = NULL;
     int retval;
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 11, 0)
     retval = _fault_common(vma, vaddr, &page, vmf->flags);
+#else
+    retval = _fault_common(NULL, vaddr, &page, vmf->flags);
+#endif
 
     vmf->page = page;
 
@@ -866,7 +874,11 @@
 // it's dealing with anonymous mapping (see handle_pte_fault).
 //
 #if defined(NV_VM_OPERATIONS_STRUCT_HAS_FAULT)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 11, 0)
 int _sigbus_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+#else
+int _sigbus_fault(struct vm_fault *vmf)
+#endif
 {
     vmf->page = NULL;
     return VM_FAULT_SIGBUS;

Run the installer with

sh NVIDIA-Linux-x86_64-340.106.run --apply-patch kernel-4.11.patch

kernel-4.11.patch.gz (517 Bytes)

Unbelivable! Thank you!
worked for me to install the 340.106 Driver to my Ubuntu 16.04 with 8800 GT card!

I had a problem, that it could only work for me with the --no-unified-memory

No I have installed it without that option.

May you also be so kind to help me understand what could be my next step to install CUDA? (I can not understand it completely, if I need to install CUDA 1.1 or if I can use CUDA 6.5 with my card)

CUDA 6.5 and OpenCL 1.1 should work, but you will need to make sure that the following nvidia device nodes are present after a reboot:

bash-4.4$ ls -lha /dev |grep nvidia
crw-rw-rw-   1 root root    243,   0 jan 27 10:42 nvidia-uvm
crw-rw-rw-   1 root root    195,   0 jan 27 10:50 nvidia0
crw-rw-rw-   1 root root    195, 255 jan 27 10:50 nvidiactl

Make sure that nvidia-modprobe is installed and add the following lines to your /etc/rc.local:

# Create missing nvidia device nodes after reboot
/usr/bin/nvidia-modprobe -c 0 -u

The CUDA Toolkit version 6.5 is available here:
https://developer.nvidia.com/cuda-toolkit-65

Thank you for reply!
I do not have the /etc/rc.d/rc.local in my Ubuntu 16.04 system, but I do have the /etc/rc.local file. Will it work if I put it there?

The other question is, do I have to add this line AFTER I install CUDA 6.5? As for now, I see that
$ /usr/bin/nvidia-modprobe -c 0 -u
modprobe: ERROR: could not insert ‘nvidia_384_uvm’: Unknown symbol in module, or unknown parameter (see dmesg)
And I see A lot of similar lines in dmesg like this:
nvidia_uvm: Unknown symbol nvUvmInterfaceDisableAccessCntr (err 0)
nvidia_uvm: Unknown symbol nvUvmInterfaceUnsetPageDirectory (err 0)
And so on…

Thank you for advice!

Yes, you should add the lines to your rc.local before you install the CUDA toolkit

# Create missing nvidia device nodes after reboot
/usr/bin/nvidia-modprobe -c 0 -u

That’s odd. It looks like you have installed a kernel module not supported by your card. what is the output of your:

ls -lha /dev |grep nvidia

Sorry, I did not post it:

ls -lha /dev |grep nvidia
crw-rw-rw-   1 root root      195,   0 янв 28 17:03 nvidia0
crw-rw-rw-   1 root root      195, 255 янв 28 17:03 nvidiactl

And this is

$ nvidia-smi
Sun Jan 28 19:13:11 2018       
+------------------------------------------------------+                       
| NVIDIA-SMI 340.106    Driver Version: 340.106        |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 8800 GTS    Off  | 0000:01:00.0     N/A |                  N/A |
| 60%   58C    P0    N/A /  N/A |    259MiB /   319MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

The driver is installed and works I mean.
I did not made an attempt to install CUDA 6.5 yet. Should I do it? Or you mean, the nvidia-uvm should appear in the /dev section before I play with CUDA installation?

Yes, you will need to create the

/dev/nvidia-uvm

device node. As I said before I had to add the following line to my rc.local:

/usr/bin/nvidia-modprobe -c 0 -u

Otherwise CUDA/OpenCL won’t work. Is the nvidia-uvm.ko kernel module present in?:

/lib/modules/<kernel-version>/kernel/drivers/video

Do you see any OpenCL device information if you compile and run clinfo?
I get:

Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.1 CUDA 6.5.51
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts 
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce 9300M GS
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.0 CUDA
  Driver Version                                  340.106
  Device OpenCL C Version                         OpenCL C 1.0 
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               1
  Max clock frequency                             1450MHz
  Compute Capability (NV)                         1.1
  Max work item dimensions                        3
  Max work item sizes                             512x512x64
  Max work group size                             512
  Compiler Available                              Yes
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              536150016 (511.3MiB)
  Error Correction support                        No
  Max memory allocation                           134217728 (128MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max 2D image size                             4096x16383 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               16384 (16KiB)
  Registers per block (NV)                        8192
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       No
    Number of async copy engines                  0
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics

clinfo-master.zip (36.8 KB)

Hi matstegner!
Thank you for the zip archive. I run the cyda 6.5 installer and then run
./clinfo

Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.1 CUDA 6.5.51
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts 
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce 8800 GTS
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.0 CUDA
  Driver Version                                  340.106
  Device OpenCL C Version                         OpenCL C 1.0 
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               12
  Max clock frequency                             1188MHz
  Compute Capability (NV)                         1.0
  Max work item dimensions                        3
  Max work item sizes                             512x512x64
  Max work group size                             512
  Compiler Available                              Yes
  Preferred work group size multiple              <getWGsizes:512: create context : error -6>
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              335216640 (319.7MiB)
  Error Correction support                        No
  Max memory allocation                           134217728 (128MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max 2D image size                             4096x16383 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               16384 (16KiB)
  Registers per block (NV)                        8192
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       No
    Number of async copy engines                  0
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts  

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            <checkNullCtx:2394: create context with device from default platform : error -6>
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
	NOTE:	your OpenCL library declares to support OpenCL 1.2,
		but it seems to support up to OpenCL 2.1 too.

Please let me know, if hat is a problem if i get this errors at lines:

Preferred work group size multiple              <getWGsizes:512: create context : error -6>

Yes, now I do have it.

One more thing - when I installed the cuda 6.5 I got this message:

Missing recommended library: libGLU.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

Installing the CUDA Samples in /home/artem ...

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-6.5
Samples:  Installed in /home/artem, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-6.5/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-6.5/lib64, or, add /usr/local/cuda-6.5/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-6.5/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Getting_Started_Guide_For_Linux.pdf in /usr/local/cuda-6.5/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 340.00 is required for CUDA 6.5 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Does it meen that the CUDA is not installed properly? I can make and run the deviceQuery and it runs anyway:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 8800 GTS"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.0
  Total amount of global memory:                 320 MBytes (335216640 bytes)
  (12) Multiprocessors, (  8) CUDA Cores/MP:     96 CUDA Cores
  GPU Clock rate:                                1188 MHz (1.19 GHz)
  Memory Clock rate:                             792 Mhz
  Memory Bus Width:                              320-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          No with 0 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       No
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce 8800 GTS
Result = PASS

Thank you if you have any comments,

P.S. And please help me with the last qustion. I have a very lagging system now, after I installed the NVidia drivers. When I enlarge any window to maximum size everything freezez and i get this in dmesg:

[ 1203.575211] NVRM: GPU at PCI:0000:01:00: GPU-31bdc9ca-2c3f-51cf-3c50-73c722c1cb90
[ 1203.575229] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0001, Class 0000502d, Offset 00000860, Data ff300a24
[ 1203.639213] NVRM: Xid (PCI:0000:01:00): 3, C 00000001 SC 00000000 M 00000188 Data 0100cb41
[ 1205.638977] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[ 1207.639421] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[ 1207.639458] NVRM: Xid (PCI:0000:01:00): 1, Channel 00000001 Method 00000000 Data beef50b0
[ 1212.891308] NVRM: Xid (PCI:0000:01:00): 3, C 00000001 SC 00000000 M 00000188 Data 0100cb41
[ 1214.891103] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
[ 1216.891245] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

When i switch to the nouveau driver, I do not have any problems while the computer is working for hours, I can play a video-game on it even, no freezes.

Thank you very much for your support!
With a very kind Regards
Artem

Hmm I don’t have the CUDA Toolkit installed so I can’t help you further but it looks like CUDA is detected. I have blacklisted nouveau and I don’t experience any slowdowns on my system.

I have blacklisted nouveau and everything in the framebuffer.
But still, I can not enlarge the windows and even sometimes make a new widow of chrome brouser. here is what i get in dmesg in red color:

[  748.944757] NVRM: GPU at PCI:0000:01:00: GPU-31bdc9ca-2c3f-51cf-3c50-73c722c1cb90
[  748.944769] NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 0004, Class 0000502d, Offset 00000250, Data ffffffff, ErrorCode 0000000c
[  749.720011] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0004, Class 00005097, Offset 00001530, Data 00000001
[  750.439070] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0004, Class 00005097, Offset 00000104, Data 00000000
[  750.449114] NVRM: Xid (PCI:0000:01:00): 6, PE0001 
[  751.142383] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0004, Class 00005097, Offset 00000f10, Data 41800000

I don’t know what the problem could be. I only see the following in dmesg:

[   14.235132] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  340.106  Tue Jan  9 15:10:23 PST 2018

Can someone tell me where can I get the source code for nvidia-uvm.ko?

Extract the .run installer using -x option, the sources are in directories kernel, kernel/uvm.

Thanks.

One more thing. I see that on my system (ubuntu 16.04), nvidia-384 drivers are already installed. So I wanted to know what are the steps to override these drivers with my own modified nvidia-390 drivers (source code of which I extracted from the .run installer)

Will the following steps work? (I ask because I share this machine with other people and I don’t want to cause inconvenience if these steps cause some issues)

  1. sudo apt remove nvidia-384 # remove existing drivers
  2. cd NVIDIA-Linux-x86_64-384.90/kernel
  3. make # compile drivers from source - This step works
  4. sudo make install # What will this step do exactly?