Thar she blows... CUDA 8 RC available

https://developer.nvidia.com/cuda-release-candidate-download

The CUDA 8.0 RC version is “8.0.27” while the EA release was “8.0.21”.

Thanks @allanmac, that picture made my day.

You’re probably one of the people in the boat manning the bug report harpoons! ^_^

New arch types listed by ptxas: ‘compute_60’,‘compute_61’,‘compute_62’,‘sm_60’,‘sm_61’,‘sm_62’

New PTX instructions in sm_61: dp4a (“Four-way byte dot product-accumulate”) and dp2a (“Two-way dot product-accumulate”)

Global atom and red PTX instructions also get scope modifiers: { .cta, .gpu, .sys }.

Release Notes here.

for me, it finally supports msvc2015 (hey, c++14) and moderngpu 2.0

sm 6.2 is probably tegra. i believe that all desktop gpus from gp102 to gp106 will be sm 6.1

Release Notes said “Windows Server 2008 R2 Support. Support for Windows Server 2008 R2 is now
deprecated and will be removed in a future version of the CUDA toolkit.”

does it mean compilation or runtime environment?

It pretty much means both.

Future toolkits may not provide the support for compiling with this as a target environment.
Future toolkits may not provide the necessary support (e.g. libraries) to run codes with this as a target environment.

Built some of my main benchmark projects against CUDA 8 and find that it seems to be modestly faster than CUDA 7.5 for my small sample.

The verbose compilation output looks about the same as CUDA 7.5 in terms of registers used etc.

Was not able to get my hands on a GTX 1080 so those tests will wait until that GPU are available.

So far so good. Look forward to using the graph library.

If the 1080 is indeed an sm_61 part and has the dp4a and dp2a instructions I’ll be pretty happy. I was actually more interested in those than fp16 support. And I didn’t even know about the dp2a instruction. Could come in handy if 8 bits isn’t quite enough for one of the operands.

Yummy ^_^. So many dev tool updates out this month… Going to be a fun weekend :-D

1080 is definitely Compute Capability 6.1 - it was mentioned on nvidia site

@scottgray

Hi Scott,
Got any plans for making a pascal assembler?

@Lewei

maxas requires no changes to support Pascal. Though I just need to add the new op codes. Many of the fp16 codes were already added for sm_53.

I looked at the 8.0 documentation “Programming Guide” and was disappointed in not seeing Pascal documented in the tables (2 & 12).
Table 2 in section 5.4.1 Arithmetic Instructions
(-- only covers Compute Capability Fermi, Kepler and Maxwell).
Table 12. Feature Support per Compute Capability in section G Compute Capabilities
(-- only covers Compute Capability Fermi, Kepler and Maxwell).
Also would like to see the first 2 figures to reflect progress since 2013 Aug (Maxwell 5.0)