So what's new about Maxwell?
  2 / 12    
Awesome! Hopefully we see GM10x follow-on cards very soon.
Awesome! Hopefully we see GM10x follow-on cards very soon.

#16
Posted 02/18/2014 08:51 PM   
Anyone knows if the GTX 750 Ti has Dynamic parallelism? At https://developer.nvidia.com/cuda-gpus appears with compute capability 3.0.... I want to try the new architecture but I need this feature.
Anyone knows if the GTX 750 Ti has Dynamic parallelism? At https://developer.nvidia.com/cuda-gpus appears with compute capability 3.0....
I want to try the new architecture but I need this feature.

#17
Posted 02/18/2014 11:35 PM   
Not directly related to Maxwell, but I'm pleased to see improved code generation in CUDA 6.0. After recompiling my image processing codes, the instruction count reduced by 12% and kernel time by 22% ! One thing I've always been bothered by is the very inefficient array indexing code. Unlike x86 which can compute index * scale + offset + constOffset with a single load/store instruction, CUDA actually uses multiply and add instructions to do it (you can translate the array index into an induction variable, but that increases register use). 64 bit addressing makes it worse by doubling the # instructions. It took me a while to realize why my simple code had 2 multiplies for each memory load: addressLow32 = IMAD(index, scale, pointerBaseLow32) // compute lower 32 bits of address addressHigh32 = IMAD.hi(index, scale, pointerBaseHigh32) // compute upper 32 bits With CUDA 6, the address generation is improved to: addressLow32 = IMAD(index, scale, pointerBaseLow32) sign = index < 0 ? 0xffffffff : 0 addressHigh32 = IADD.X(sign, pointerBaseUpper32) which could be better for throughput, but makes inspecting assembly code even harder by littering it with more address calculation.
Not directly related to Maxwell, but I'm pleased to see improved code generation in CUDA 6.0. After recompiling my image processing codes, the instruction count reduced by 12% and kernel time by 22% !

One thing I've always been bothered by is the very inefficient array indexing code. Unlike x86 which can compute

index * scale + offset + constOffset with a single load/store instruction, CUDA actually uses multiply and add instructions to do it (you can translate the array index into an induction variable, but that increases register use). 64 bit addressing makes it worse by doubling the # instructions.

It took me a while to realize why my simple code had 2 multiplies for each memory load:

addressLow32 = IMAD(index, scale, pointerBaseLow32) // compute lower 32 bits of address
addressHigh32 = IMAD.hi(index, scale, pointerBaseHigh32) // compute upper 32 bits

With CUDA 6, the address generation is improved to:

addressLow32 = IMAD(index, scale, pointerBaseLow32)
sign = index < 0 ? 0xffffffff : 0
addressHigh32 = IADD.X(sign, pointerBaseUpper32)

which could be better for throughput, but makes inspecting assembly code even harder by littering it with more address calculation.

#18
Posted 02/18/2014 11:46 PM   
[quote="hastursan"]Anyone knows if the GTX 750 Ti has Dynamic parallelism? At https://developer.nvidia.com/cuda-gpus appears with compute capability 3.0.... I want to try the new architecture but I need this feature.[/quote] We should know as soon as someone gets one and prints the device caps. It's likely sm_35 or the (new) sm_32. It is not the sm_37 buried in the CUDA 6.0 headers (which provides more shared memory than the 64K GM108 is known to have). Even GK208 is sm_35. One (small) clue is from the GM107 white paper, which says "our first-generation Maxwell GPUs offer the same API functionality as Kepler GPUs". That doesn't tell us anything really except it's sm_3x.
hastursan said:Anyone knows if the GTX 750 Ti has Dynamic parallelism? At https://developer.nvidia.com/cuda-gpus appears with compute capability 3.0....
I want to try the new architecture but I need this feature.


We should know as soon as someone gets one and prints the device caps. It's likely sm_35 or the (new) sm_32. It is not the sm_37 buried in the CUDA 6.0 headers (which provides more shared memory than the 64K GM108 is known to have).

Even GK208 is sm_35.

One (small) clue is from the GM107 white paper, which says "our first-generation Maxwell GPUs offer the same API functionality as Kepler GPUs". That doesn't tell us anything really except it's sm_3x.

#19
Posted 02/19/2014 12:07 AM   
Maxwell is Compute 5.0, I know that much from cudaminer screenshots that were sent to me.
Maxwell is Compute 5.0, I know that much from cudaminer screenshots that were sent to me.

#20
Posted 02/19/2014 09:38 AM   
Here's the output of AIDA64 GPGPU / CUDA page on a MSI GeForce GTX 750 "GM107" Maxwell card: Device Properties Device Name: GeForce GTX 750 GPU Code Name: GM107 PCI Domain / Bus / Device: 0 / 1 / 0 Clock Rate: 1137 MHz Asynchronous Engines: 1 Multiprocessors / Cores: 4 / 512 L2 Cache: 2048 KB Max Threads Per Multiprocessor: 2048 Max Threads Per Block: 1024 Max Registers Per Block: 65536 Max 32-bit Registers Per Multiprocessor: 65536 Max Instructions Per Kernel: 512 million Warp Size: 32 threads Max Block Size: 1024 x 1024 x 64 Max Grid Size: 2147483647 x 65535 x 65535 Max 1D Texture Width: 65536 Max 2D Texture Size: 65536 x 65536 Max 3D Texture Size: 4096 x 4096 x 4096 Max 1D Linear Texture Width: 134217728 Max 2D Linear Texture Size: 65000 x 65000 Max 2D Linear Texture Pitch: 1048544 bytes Max 1D Layered Texture Width: 16384 Max 1D Layered Texture Layers: 2048 Max Mipmapped 1D Texture Width: 16384 Max Mipmapped 2D Texture Size: 16384 x 16384 Max Cubemap Texture Size: 16384 x 16384 Max Cubemap Layered Texture Size: 16384 x 16384 Max Cubemap Layered Texture Layers: 2046 Max Texture Array Size: 16384 x 16384 Max Texture Array Slices: 2048 Max 1D Surface Width: 65536 Max 2D Surface Size: 65536 x 32768 Max 3D Surface Size: 65536 x 32768 x 2048 Max 1D Layered Surface Width: 65536 Max 1D Layered Surface Layers: 2048 Max 2D Layered Surface Size: 65536 x 32768 Max 2D Layered Surface Layers: 2048 Compute Mode: Default: Multiple contexts allowed per device Compute Capability: 5.0 CUDA DLL: nvcuda.dll (8.17.13.3489 - nVIDIA ForceWare 334.89) Memory Properties Memory Clock: 2505 MHz Global Memory Bus Width: 128-bit Total Memory: 1 GB Total Constant Memory: 64 KB Max Shared Memory Per Block: 48 KB Max Shared Memory Per Multiprocessor: 64 KB Max Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Texture Pitch Alignment: 32 bytes Surface Alignment: 512 bytes Device Features 32-bit Floating-Point Atomic Addition: Supported 32-bit Integer Atomic Operations: Supported 64-bit Integer Atomic Operations: Supported Caching Globals in L1 Cache: Not Supported Caching Locals in L1 Cache: Not Supported Concurrent Kernel Execution: Supported Concurrent Memory Copy & Execute: Supported Double-Precision Floating-Point: Supported ECC: Disabled Funnel Shift: Supported Host Memory Mapping: Supported Integrated Device: No Managed Memory: Not Supported Multi-GPU Board: No Stream Priorities: Not Supported Surface Functions: Supported TCC Driver: No Unified Addressing: No Warp Vote Functions: Supported __ballot(): Supported __syncthreads_and(): Supported __syncthreads_count(): Supported __syncthreads_or(): Supported __threadfence_system(): Supported
Here's the output of AIDA64 GPGPU / CUDA page on a MSI GeForce GTX 750 "GM107" Maxwell card:

Device Properties
Device Name: GeForce GTX 750
GPU Code Name: GM107
PCI Domain / Bus / Device: 0 / 1 / 0
Clock Rate: 1137 MHz
Asynchronous Engines: 1
Multiprocessors / Cores: 4 / 512
L2 Cache: 2048 KB
Max Threads Per Multiprocessor: 2048
Max Threads Per Block: 1024
Max Registers Per Block: 65536
Max 32-bit Registers Per Multiprocessor: 65536
Max Instructions Per Kernel: 512 million
Warp Size: 32 threads
Max Block Size: 1024 x 1024 x 64
Max Grid Size: 2147483647 x 65535 x 65535
Max 1D Texture Width: 65536
Max 2D Texture Size: 65536 x 65536
Max 3D Texture Size: 4096 x 4096 x 4096
Max 1D Linear Texture Width: 134217728
Max 2D Linear Texture Size: 65000 x 65000
Max 2D Linear Texture Pitch: 1048544 bytes
Max 1D Layered Texture Width: 16384
Max 1D Layered Texture Layers: 2048
Max Mipmapped 1D Texture Width: 16384
Max Mipmapped 2D Texture Size: 16384 x 16384
Max Cubemap Texture Size: 16384 x 16384
Max Cubemap Layered Texture Size: 16384 x 16384
Max Cubemap Layered Texture Layers: 2046
Max Texture Array Size: 16384 x 16384
Max Texture Array Slices: 2048
Max 1D Surface Width: 65536
Max 2D Surface Size: 65536 x 32768
Max 3D Surface Size: 65536 x 32768 x 2048
Max 1D Layered Surface Width: 65536
Max 1D Layered Surface Layers: 2048
Max 2D Layered Surface Size: 65536 x 32768
Max 2D Layered Surface Layers: 2048
Compute Mode: Default: Multiple contexts allowed per device
Compute Capability: 5.0
CUDA DLL: nvcuda.dll (8.17.13.3489 - nVIDIA ForceWare 334.89)

Memory Properties
Memory Clock: 2505 MHz
Global Memory Bus Width: 128-bit
Total Memory: 1 GB
Total Constant Memory: 64 KB
Max Shared Memory Per Block: 48 KB
Max Shared Memory Per Multiprocessor: 64 KB
Max Memory Pitch: 2147483647 bytes
Texture Alignment: 512 bytes
Texture Pitch Alignment: 32 bytes
Surface Alignment: 512 bytes

Device Features
32-bit Floating-Point Atomic Addition: Supported
32-bit Integer Atomic Operations: Supported
64-bit Integer Atomic Operations: Supported
Caching Globals in L1 Cache: Not Supported
Caching Locals in L1 Cache: Not Supported
Concurrent Kernel Execution: Supported
Concurrent Memory Copy & Execute: Supported
Double-Precision Floating-Point: Supported
ECC: Disabled
Funnel Shift: Supported
Host Memory Mapping: Supported
Integrated Device: No
Managed Memory: Not Supported
Multi-GPU Board: No
Stream Priorities: Not Supported
Surface Functions: Supported
TCC Driver: No
Unified Addressing: No
Warp Vote Functions: Supported
__ballot(): Supported
__syncthreads_and(): Supported
__syncthreads_count(): Supported
__syncthreads_or(): Supported
__threadfence_system(): Supported

#21
Posted 02/19/2014 09:43 AM   
So it's not a deviceQuery, but still good information. AIDA64 is a shareware benchmark tool. [code] Max Shared Memory Per Block: 48 KB Max Shared Memory Per Multiprocessor: 64 KB [/code] oh, this per block shared memory limit disappoints. [code] Caching Globals in L1 Cache: Not Supported Caching Locals in L1 Cache: Not Supported [/code] so local memory spills are no longer covered by the L1 cache? So what IS covered by L1 then? [code] Funnel Shift: Supported Multi-GPU Board: No Integrated Device: No TCC Driver: No Unified Addressing: No [/code] nice that all this is now a device capability. I want a Multi-GPU board with Maxwell. But why is unified addressing not available? Maybe it requires a 64 bit system which Flery may not have had? [code] __ballot(): Supported __syncthreads_and(): Supported __syncthreads_count(): Supported __syncthreads_or(): Supported __threadfence_system(): Supported [/code] oh, what's all this?
So it's not a deviceQuery, but still good information. AIDA64 is a shareware benchmark tool.

Max Shared Memory Per Block: 48 KB
Max Shared Memory Per Multiprocessor: 64 KB


oh, this per block shared memory limit disappoints.

Caching Globals in L1 Cache: Not Supported
Caching Locals in L1 Cache: Not Supported


so local memory spills are no longer covered by the L1 cache? So what IS covered by L1 then?


Funnel Shift: Supported
Multi-GPU Board: No
Integrated Device: No
TCC Driver: No
Unified Addressing: No


nice that all this is now a device capability. I want a Multi-GPU board with Maxwell. But why is unified addressing not available? Maybe it requires a 64 bit system which Flery may not have had?


__ballot(): Supported
__syncthreads_and(): Supported
__syncthreads_count(): Supported
__syncthreads_or(): Supported
__threadfence_system(): Supported


oh, what's all this?

#22
Posted 02/19/2014 09:58 AM   
I know it's not DeviceQuery, but AIDA64 will provide more information on CUDA devices than DeviceQuery ;) Here's the results of DeviceQuery: deviceQuery.exe Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 750" CUDA Driver Version / Runtime Version 6.0 / 6.0 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores GPU Clock rate: 1137 MHz (1.14 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 750 Result = PASS -------------- BTW, I'm using Windows 7 64-bit SP1 with ForceWare 334.89 WHQL.
I know it's not DeviceQuery, but AIDA64 will provide more information on CUDA devices than DeviceQuery ;) Here's the results of DeviceQuery:

deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 750"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Clock rate: 1137 MHz (1.14 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 750
Result = PASS

--------------

BTW, I'm using Windows 7 64-bit SP1 with ForceWare 334.89 WHQL.

#23
Posted 02/19/2014 09:59 AM   
AIDA64 OpenCL GPGPU benchmark results for GTX750: Single-Precision FLOPS (FP32): 1190 GFLOPS Double-Precision FLOPS (FP64): 37.72 GFLOPS 24-bit Integer IOPS: 399.3 GIOPS 32-bit Integer IOPS: 399.3 GIOPS 64-bit Integer IOPS: 82.76 GIOPS GPU core clock was cca. 1175 MHz, but was fluctuating due to GPU Boost. FP64 1/32 rate was confirmed by the benchmarks, so as the 32-bit integer rate of 1/3. The latter is quite an improvement from Kepler.
AIDA64 OpenCL GPGPU benchmark results for GTX750:

Single-Precision FLOPS (FP32): 1190 GFLOPS
Double-Precision FLOPS (FP64): 37.72 GFLOPS
24-bit Integer IOPS: 399.3 GIOPS
32-bit Integer IOPS: 399.3 GIOPS
64-bit Integer IOPS: 82.76 GIOPS

GPU core clock was cca. 1175 MHz, but was fluctuating due to GPU Boost. FP64 1/32 rate was confirmed by the benchmarks, so as the 32-bit integer rate of 1/3. The latter is quite an improvement from Kepler.

#24
Posted 02/19/2014 10:04 AM   
[quote="Fiery"]I know it's not DeviceQuery, but AIDA64 will provide more information on CUDA devices than DeviceQuery ;) Here's the results of DeviceQuery: Device supports Unified Addressing (UVA): Yes BTW, I'm using Windows 7 64-bit SP1 with ForceWare 334.89 WHQL.[/quote] So AIDA64 must be lying with respect to unified memory, because it stated "Unified Addressing: No"
Fiery said:I know it's not DeviceQuery, but AIDA64 will provide more information on CUDA devices than DeviceQuery ;) Here's the results of DeviceQuery:

Device supports Unified Addressing (UVA): Yes

BTW, I'm using Windows 7 64-bit SP1 with ForceWare 334.89 WHQL.


So AIDA64 must be lying with respect to unified memory, because it stated "Unified Addressing: No"

#25
Posted 02/19/2014 10:16 AM   
Unified Addressing is reported differently in AIDA64 since it uses a 32-bit main binary. While I used the native 64-bit binary for DeviceQuery.
Unified Addressing is reported differently in AIDA64 since it uses a 32-bit main binary. While I used the native 64-bit binary for DeviceQuery.

#26
Posted 02/19/2014 10:10 AM   
Output of 32-bit DeviceQuery: deviceQuery.exe Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 750" CUDA Driver Version / Runtime Version 6.0 / 6.0 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores GPU Clock rate: 1137 MHz (1.14 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): No Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 750 Result = PASS
Output of 32-bit DeviceQuery:

deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 750"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Clock rate: 1137 MHz (1.14 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GeForce GTX 750
Result = PASS

#27
Posted 02/19/2014 10:12 AM   
okay, thanks for posting this! Why did you not get the Ti model? that's 1 GB more memory and 1 more multiprocessor at little extra cost...
okay, thanks for posting this!

Why did you not get the Ti model? that's 1 GB more memory and 1 more multiprocessor at little extra cost...

#28
Posted 02/19/2014 10:25 AM   
[quote="cbuchner1"] okay, thanks for posting this! Why did you not get the Ti model? that's 1 GB more memory and 1 more multiprocessor at little extra cost... [/quote] It's for development purposes, so all it mattered was the GM107 chip.
cbuchner1 said:
okay, thanks for posting this!

Why did you not get the Ti model? that's 1 GB more memory and 1 more multiprocessor at little extra cost...


It's for development purposes, so all it mattered was the GM107 chip.

#29
Posted 02/19/2014 10:21 AM   
[quote="cbuchner1"][code] __ballot(): Supported __syncthreads_and(): Supported __syncthreads_count(): Supported __syncthreads_or(): Supported __threadfence_system(): Supported [/code] oh, what's all this? [/quote] Those features are old ones, supported by all GPUs with a CC of at least 2.0 (Fermi+).
cbuchner1 said:
__ballot(): Supported
__syncthreads_and(): Supported
__syncthreads_count(): Supported
__syncthreads_or(): Supported
__threadfence_system(): Supported


oh, what's all this?


Those features are old ones, supported by all GPUs with a CC of at least 2.0 (Fermi+).

#30
Posted 02/19/2014 10:22 AM   
  2 / 12    
Scroll To Top