"Kernel timeout" on Tesla

szalejot · May 2, 2012, 9:36am

Hi,

I have problem with “kernel timeout” under Fedora 16.

I know, that it can be caused, by X server timeout, but in this machine I have some ‘normal’ graphic card and nVidia Tesla C2075

I think that Tesla is not even capable to run graphics interface. So why I have kernel timeout?

My device query:

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "Tesla C2075"

  CUDA Driver Version / Runtime Version          4.2 / 4.2

  CUDA Capability Major/Minor version number:    2.0

  Total amount of global memory:                 5375 MBytes (5636292608 bytes)

  (14) Multiprocessors x ( 32) CUDA Cores/MP:    448 CUDA Cores

  GPU Clock rate:                                1147 MHz (1.15 GHz)

  Memory Clock rate:                             1566 Mhz

  Memory Bus Width:                              384-bit

  L2 Cache Size:                                 786432 bytes

  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)

  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 32768

  Warp size:                                     32

  Maximum number of threads per multiprocessor:  1536

  Maximum number of threads per block:           1024

  Maximum sizes of each dimension of a block:    1024 x 1024 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Concurrent copy and execution:                 Yes with 2 copy engine(s)

  Run time limit on kernels:                     Yes

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Concurrent kernel execution:                   Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support enabled:                Yes

  Device is using TCC driver mode:               No

  Device supports Unified Addressing (UVA):      Yes

  Device PCI Bus ID / PCI location ID:           5 / 0

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.2, NumDevs = 1, Device = Tesla C2075

[deviceQuery] test results...

PASSED

and automaticly generated xorg.conf:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig

# nvidia-xconfig:  version 295.40  (mockbuild@)  Thu Apr 12 13:28:25 CEST 2012

Section "ServerLayout"

    Identifier     "Default Layout"

    Screen         "Default Screen" 0 0

    InputDevice    "Keyboard0" "CoreKeyboard"

    InputDevice    "Mouse0" "CorePointer"

EndSection

Section "InputDevice"

    # generated from data in "/etc/sysconfig/keyboard"

    Identifier     "Keyboard0"

    Driver         "keyboard"

    Option         "XkbLayout" "us"

    Option         "XkbModel" "pc105"

EndSection

Section "InputDevice"

    # generated from default

    Identifier     "Mouse0"

    Driver         "mouse"

    Option         "Protocol" "auto"

    Option         "Device" "/dev/input/mice"

    Option         "Emulate3Buttons" "no"

    Option         "ZAxisMapping" "4 5"

EndSection

Section "Device"

    Identifier     "Videocard0"

    Driver         "nvidia"

EndSection

Section "Screen"

    Identifier     "Default Screen"

    Device         "Videocard0"

    SubSection     "Display"

        Modes      "nvidia-auto-select"

    EndSubSection

EndSection

Is problem caused by fact, that in xorg.conf I have provided “nvidia” driver?

With this old xorg.conf X server didn’t want to start (“no screens detected”)

Section "Device"

	Identifier "Videocard0"

	Driver "vesa"

EndSection

cudaDMA · May 2, 2012, 5:16pm

I think TESLA card is driving the display in this case as I see “Run time limit on Kernels as ‘YES’”.Exiting X will allow you to run CUDA applications as long as you want.

szalejot · May 4, 2012, 6:43am

Yes, but I can have few users simultaneously connected to this server and some of them want to have X server started. Best combination for me is X server running on ‘non-CUDA’ GPU and leave Tesla card only for CUDA computations.