I’m trying to learn OpenCL to use in my master’s project but I couldn’t do anything yet.
The problem is that clGetPlatformIDs is returning -1001. (err = clGetPlatformIDs(1, &cpPlatform, &cpNPlatform); cpPlatform and cpNPlatform are 0)
I have one of those Dell XPS 14 with optimus, running Ubuntu 12.04.
lspci shows:
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 18) (prog-if 00 [VGA controller])
Subsystem: Dell Device 0468
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 46
Region 0: Memory at fa400000 (64-bit, non-prefetchable)
Region 2: Memory at b0000000 (64-bit, prefetchable)
Region 4: I/O ports at f080
Expansion ROM at <unassigned> [disabled]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
01:00.0 VGA compatible controller: NVIDIA Corporation Device 0df1 (rev a1) (prog-if 00 [VGA controller])
Subsystem: Dell Device 0468
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f9000000 (32-bit, non-prefetchable)
Region 1: Memory at c0000000 (64-bit, prefetchable)
Region 3: Memory at d0000000 (64-bit, prefetchable)
Region 5: I/O ports at e000
[virtual] Expansion ROM at fa000000 [disabled]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidia_current_updates, nvidia_current, nouveau, nvidiafb
Since optimus doesn’t work on Linux I’ve installed bumblebee and I can use both graphics cards to run applications but I still can’t run my OpenCL code.
I’ve already tried some “solutions” that I’ve found on the Internet such as running my code as root, give permission to my user to use /dev/nvidia0 /dev/nvidiactl, changing my drivers and etcetera, but nothing worked.
I can’t even run OpenCL examples that came with cuda toolkit.
Does anyone know what could be the problem? Is there any solution/workaround for it?
After installing some missing libraries I could run the example you’ve mentinoned (at least I hope ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery is the same as bin/linux/release/cudaDeviceQuery).
On my first try I got this:
caio@xps14:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release$ ./deviceQuery
[deviceQuery] starting...
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
[deviceQuery] test results...
FAILED
> exiting in 3 seconds: 3...2...1...done!
But after turning my NVIDIA GPU on (using optirun or sudo tee /proc/acpi/bbswitch <<<ON) everything worked:
caio@xps14:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release$ ./deviceQuery[deviceQuery] starting...
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Found 1 CUDA Capable device(s)
Device 0: "GeForce GT 420M"
CUDA Driver Version / Runtime Version 4.2 / 4.2
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073414144 bytes)
( 2) Multiprocessors x ( 48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock rate: 1000 MHz (1.00 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.2, NumDevs = 1, Device = GeForce GT 420M
[deviceQuery] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
Even with my NVIDIA GPU on (sudo tee /proc/acpi/bbswitch <<<ON) nvidia-detector returns none, glxspheres uses Mesa DRI Intel(R) Ironlake Mobile (instead of GeForce GT 420M/PCIe/SSE2 which is used if I call it with optirun) and not always I can see /dev/nvidia0 and /dev/nvidiactl.
I’m not sure if this is expected or not but it might help(?).
I posted that after I realized I had forgotten to check for the icd but for some reason it only picks up the nvidia devices not the intel devices despite having the icds
Also if you are trying to use intel OpenCL icd you need to add the path to ld config
you make a file called anything such as intelcl.conf in
/etc/ld.so.conf.d directory and use a text editor to have
only
/usr/lib64/OpenCL/vendors/intel/
then do
sudo ldconfig
I am just adding the following to the scripts I use to set intel cl up