Release 1.1 of SIMD-in-a-word functions posted
  1 / 2    
Release 1.1 of the SIMD-in-word functions was posted March 19, 4 pm PDT. New new release is currently only available from the new registsred developer website at [url]https://developer.nvidia.com/user/register[/url]. Log in, then click green link "CUDA/GPU Computing Registered Developer Program", then clock green link "Download" following "CUDA SIMD-within-a-word functions". The downloaded file is named simd_functions_v1_1.tar Those currently at GTC may be interested to learn how to speed up life sciences applications with the Kepler SIMD video instructions, which these functions provide convenient access to. Check out the life sciences track here: [url]http://registration.gputechconf.com/quicklink/eptj7dX[/url] Release notes [technical difficulties prevent me from posting the entire list of functions, sorry] [code] /* Release 1.1 (1) Use of incorrect symbol in multiple-inclusion guard has been corrected. (2) 44 additional functions were added to the initial set of 38 functions. (3) The emulation paths for many existing functions were optimized for sm_2x This header file contains inline functions that implement intra-word SIMD operations, that are hardware accelerated on sm_3x (Kepler) GPUs. Efficient emulation code paths are provided for earlier architectures (sm_1x, sm_2x) to make the code portable across all GPUs supported by CUDA. The following functions are currently implemented: vabs2(a) per-halfword absolute value, with wrap-around: |a| vabsdiffs2(a,b) per-halfword absolute difference of signed integer: |a - b| vabsdiffu2(a,b) per-halfword absolute difference of unsigned integer: |a - b| vabsss2(a) per-halfword abs. value, with signed saturation: sat.s16(|a|) vadd2(a,b) per-halfword (un)signed addition, with wrap-around: a + b vaddss2(a,b) per-halfword addition with signed saturation: sat.s16 (a + b) vaddus2(a,b) per-halfword addition with unsigned saturation: sat.u16 (a+b) vavgs2(a,b) per-halfword signed rounded average: (a+b+((a+b)>=0)) >> 1 vavgu2(a,b) per-halfword unsigned rounded average: (a + b + 1) / 2 vcmpeq2(a,b) per-halfword (un)signed comparison: a == b ? 0xffff : 0 vcmpges2(a,b) per-halfword signed comparison: a >= b ? 0xffff : 0 vcmpgeu2(a,b) per-halfword unsigned comparison: a >= b ? 0xffff : 0 vcmpgts2(a,b) per-halfword signed comparison: a > b ? 0xffff : 0 vcmpgtu2(a,b) per-halfword unsigned comparison: a > b ? 0xffff : 0 vcmples2(a,b) per-halfword signed comparison: a <= b ? 0xffff : 0 vcmpleu2(a,b) per-halfword unsigned comparison: a <= b ? 0xffff : 0 vcmplts2(a,b) per-halfword signed comparison: a < b ? 0xffff : 0 vcmpltu2(a,b) per-halfword unsigned comparison: a < b ? 0xffff : 0 vcmpne2(a,b) per-halfword (un)signed comparison: a != b ? 0xffff : 0 vhaddu2(a,b) per-halfword unsigned average: (a + b) / 2 vmaxs2(a,b) per-halfword signed maximum: max(a, b) vmaxu2(a,b) per-halfword unsigned maximum: max(a, b) vmins2(a,b) per-halfword signed minimum: min(a, b) vminu2(a,b) per-halfword unsigned minimum: min(a, b) vneg2(a,b) per-halfword negation, with wrap-around: -a vnegss2(a,b) per-halfword negation, with signed saturation: sat.s16(-a) vsads2(a,b) per-halfword sum of abs diff of signed: sum{0,1}(|a-b|) vsadu2(a,b) per-halfword sum of abs diff of unsigned: sum{0,1}(|a-b|) vseteq2(a,b) per-halfword (un)signed comparison: a == b ? 1 : 0 vsetges2(a,b) per-halfword signed comparison: a >= b ? 1 : 0 [...] */ [/code]
Release 1.1 of the SIMD-in-word functions was posted March 19, 4 pm PDT. New new release is currently only available from the new registsred developer website at https://developer.nvidia.com/user/register. Log in, then click green link "CUDA/GPU Computing Registered Developer Program", then clock green link "Download" following "CUDA SIMD-within-a-word functions". The downloaded file is named simd_functions_v1_1.tar

Those currently at GTC may be interested to learn how to speed up life sciences applications with the Kepler SIMD video instructions, which these functions provide convenient access to. Check out the life sciences track here: http://registration.gputechconf.com/quicklink/eptj7dX

Release notes [technical difficulties prevent me from posting the entire list of functions, sorry]

/* 
Release 1.1

(1) Use of incorrect symbol in multiple-inclusion guard has been corrected.
(2) 44 additional functions were added to the initial set of 38 functions.
(3) The emulation paths for many existing functions were optimized for sm_2x

This header file contains inline functions that implement intra-word SIMD
operations, that are hardware accelerated on sm_3x (Kepler) GPUs. Efficient
emulation code paths are provided for earlier architectures (sm_1x, sm_2x)
to make the code portable across all GPUs supported by CUDA. The following
functions are currently implemented:

vabs2(a) per-halfword absolute value, with wrap-around: |a|
vabsdiffs2(a,b) per-halfword absolute difference of signed integer: |a - b|
vabsdiffu2(a,b) per-halfword absolute difference of unsigned integer: |a - b|
vabsss2(a) per-halfword abs. value, with signed saturation: sat.s16(|a|)
vadd2(a,b) per-halfword (un)signed addition, with wrap-around: a + b
vaddss2(a,b) per-halfword addition with signed saturation: sat.s16 (a + b)
vaddus2(a,b) per-halfword addition with unsigned saturation: sat.u16 (a+b)
vavgs2(a,b) per-halfword signed rounded average: (a+b+((a+b)>=0)) >> 1
vavgu2(a,b) per-halfword unsigned rounded average: (a + b + 1) / 2
vcmpeq2(a,b) per-halfword (un)signed comparison: a == b ? 0xffff : 0
vcmpges2(a,b) per-halfword signed comparison: a >= b ? 0xffff : 0
vcmpgeu2(a,b) per-halfword unsigned comparison: a >= b ? 0xffff : 0
vcmpgts2(a,b) per-halfword signed comparison: a > b ? 0xffff : 0
vcmpgtu2(a,b) per-halfword unsigned comparison: a > b ? 0xffff : 0
vcmples2(a,b) per-halfword signed comparison: a <= b ? 0xffff : 0
vcmpleu2(a,b) per-halfword unsigned comparison: a <= b ? 0xffff : 0
vcmplts2(a,b) per-halfword signed comparison: a < b ? 0xffff : 0
vcmpltu2(a,b) per-halfword unsigned comparison: a < b ? 0xffff : 0
vcmpne2(a,b) per-halfword (un)signed comparison: a != b ? 0xffff : 0
vhaddu2(a,b) per-halfword unsigned average: (a + b) / 2
vmaxs2(a,b) per-halfword signed maximum: max(a, b)
vmaxu2(a,b) per-halfword unsigned maximum: max(a, b)
vmins2(a,b) per-halfword signed minimum: min(a, b)
vminu2(a,b) per-halfword unsigned minimum: min(a, b)
vneg2(a,b) per-halfword negation, with wrap-around: -a
vnegss2(a,b) per-halfword negation, with signed saturation: sat.s16(-a)
vsads2(a,b) per-halfword sum of abs diff of signed: sum{0,1}(|a-b|)
vsadu2(a,b) per-halfword sum of abs diff of unsigned: sum{0,1}(|a-b|)
vseteq2(a,b) per-halfword (un)signed comparison: a == b ? 1 : 0
vsetges2(a,b) per-halfword signed comparison: a >= b ? 1 : 0

[...]
*/

#1
Posted 03/19/2013 11:29 PM   
Function list continued. [code] /* vsetgeu2(a,b) per-halfword unsigned comparison: a >= b ? 1 : 0 vsetgts2(a,b) per-halfword signed comparison: a > b ? 1 : 0 vsetgtu2(a,b) per-halfword unsigned comparison: a > b ? 1 : 0 vsetles2(a,b) per-halfword signed comparison: a <= b ? 1 : 0 vsetleu2(a,b) per-halfword unsigned comparison: a <= b ? 1 : 0 vsetlts2(a,b) per-halfword signed comparison: a < b ? 1 : 0 vsetltu2(a,b) per-halfword unsigned comparison: a < b ? 1 : 0 vsetne2(a,b) per-halfword (un)signed comparison: a != b ? 1 : 0 vsub2(a,b) per-halfword (un)signed subtraction, with wrap-around: a - b vsubss2(a,b) per-halfword subtraction with signed saturation: sat.s16(a-b) vsubus2(a,b) per-halfword subtraction w/ unsigned saturation: sat.u16(a-b) vabs4(a) per-byte absolute value, with wrap-around: |a| vabsdiffs4(a,b) per-byte absolute difference of signed integer: |a - b| vabsdiffu4(a,b) per-byte absolute difference of unsigned integer: |a - b| vabsss4(a) per-byte absolute value, with signed saturation: sat.s8(|a|) vadd4(a,b) per-byte (un)signed addition, with wrap-around: a + b vaddss4(a,b) per-byte addition with signed saturation: sat.s8 (a + b) vaddus4(a,b) per-byte addition with unsigned saturation: sat.u8 (a + b) vavgs4(a,b) per-byte signed rounded average: (a + b + ((a+b) >= 0)) >> 1 vavgu4(a,b) per-byte unsigned rounded average: (a + b + 1) / 2 vcmpeq4(a,b) per-byte (un)signed comparison: a == b ? 0xff : 0 vcmpges4(a,b) per-byte signed comparison: a >= b ? 0xff : 0 vcmpgeu4(a,b) per-byte unsigned comparison: a >= b ? 0xff : 0 vcmpgts4(a,b) per-byte signed comparison: a > b ? 0xff : 0 vcmpgtu4(a,b) per-byte unsigned comparison: a > b ? 0xff : 0 vcmples4(a,b) per-byte signed comparison: a <= b ? 0xff : 0 vcmpleu4(a,b) per-byte unsigned comparison: a <= b ? 0xff : 0 vcmplts4(a,b) per-byte signed comparison: a < b ? 0xff : 0 vcmpltu4(a,b) per-byte unsigned comparison: a < b ? 0xff : 0 vcmpne4(a,b) per-byte (un)signed comparison: a != b ? 0xff: 0 vhaddu4(a,b) per-byte unsigned average: (a + b) / 2 vmaxs4(a,b) per-byte signed maximum: max(a, b) vmaxu4(a,b) per-byte unsigned maximum: max(a, b) vmins4(a,b) per-byte signed minimum: min(a, b) vminu4(a,b) per-byte unsigned minimum: min(a, b) vneg4(a,b) per-byte negation, with wrap-around: -a vnegss4(a,b) per-byte negation, with signed saturation: sat.s8(-a) vsads4(a,b) per-byte sum of abs difference of signed: sum{0,3}(|a-b|) vsadu4(a,b) per-byte sum of abs difference of unsigned: sum{0,3}(|a-b|) vseteq4(a,b) per-byte (un)signed comparison: a == b ? 1 : 0 vsetges4(a,b) per-byte signed comparison: a >= b ? 1 : 0 vsetgeu4(a,b) per-byte unsigned comparison: a >= b ? 1 : 0 vsetgts4(a,b) per-byte signed comparison: a > b ? 1 : 0 vsetgtu4(a,b) per-byte unsigned comparison: a > b ? 1 : 0 vsetles4(a,b) per-byte signed comparison: a <= b ? 1 : 0 vsetleu4(a,b) per-byte unsigned comparison: a <= b ? 1 : 0 vsetlts4(a,b) per-byte signed comparison: a < b ? 1 : 0 vsetltu4(a,b) per-byte unsigned comparison: a < b ? 1 : 0 vsetne4(a,b) per-byte (un)signed comparison: a != b ? 1: 0 vsub4(a,b) per-byte (un)signed subtraction, with wrap-around: a - b vsubss4(a,b) per-byte subtraction with signed saturation: sat.s8 (a - b) vsubus4(a,b) per-byte subtraction with unsigned saturation: sat.u8 (a - b) */ [/code]
Function list continued.
/*
vsetgeu2(a,b) per-halfword unsigned comparison: a >= b ? 1 : 0
vsetgts2(a,b) per-halfword signed comparison: a > b ? 1 : 0
vsetgtu2(a,b) per-halfword unsigned comparison: a > b ? 1 : 0
vsetles2(a,b) per-halfword signed comparison: a <= b ? 1 : 0
vsetleu2(a,b) per-halfword unsigned comparison: a <= b ? 1 : 0
vsetlts2(a,b) per-halfword signed comparison: a < b ? 1 : 0
vsetltu2(a,b) per-halfword unsigned comparison: a < b ? 1 : 0
vsetne2(a,b) per-halfword (un)signed comparison: a != b ? 1 : 0
vsub2(a,b) per-halfword (un)signed subtraction, with wrap-around: a - b
vsubss2(a,b) per-halfword subtraction with signed saturation: sat.s16(a-b)
vsubus2(a,b) per-halfword subtraction w/ unsigned saturation: sat.u16(a-b)

vabs4(a) per-byte absolute value, with wrap-around: |a|
vabsdiffs4(a,b) per-byte absolute difference of signed integer: |a - b|
vabsdiffu4(a,b) per-byte absolute difference of unsigned integer: |a - b|
vabsss4(a) per-byte absolute value, with signed saturation: sat.s8(|a|)
vadd4(a,b) per-byte (un)signed addition, with wrap-around: a + b
vaddss4(a,b) per-byte addition with signed saturation: sat.s8 (a + b)
vaddus4(a,b) per-byte addition with unsigned saturation: sat.u8 (a + b)
vavgs4(a,b) per-byte signed rounded average: (a + b + ((a+b) >= 0)) >> 1
vavgu4(a,b) per-byte unsigned rounded average: (a + b + 1) / 2
vcmpeq4(a,b) per-byte (un)signed comparison: a == b ? 0xff : 0
vcmpges4(a,b) per-byte signed comparison: a >= b ? 0xff : 0
vcmpgeu4(a,b) per-byte unsigned comparison: a >= b ? 0xff : 0
vcmpgts4(a,b) per-byte signed comparison: a > b ? 0xff : 0
vcmpgtu4(a,b) per-byte unsigned comparison: a > b ? 0xff : 0
vcmples4(a,b) per-byte signed comparison: a <= b ? 0xff : 0
vcmpleu4(a,b) per-byte unsigned comparison: a <= b ? 0xff : 0
vcmplts4(a,b) per-byte signed comparison: a < b ? 0xff : 0
vcmpltu4(a,b) per-byte unsigned comparison: a < b ? 0xff : 0
vcmpne4(a,b) per-byte (un)signed comparison: a != b ? 0xff: 0
vhaddu4(a,b) per-byte unsigned average: (a + b) / 2
vmaxs4(a,b) per-byte signed maximum: max(a, b)
vmaxu4(a,b) per-byte unsigned maximum: max(a, b)
vmins4(a,b) per-byte signed minimum: min(a, b)
vminu4(a,b) per-byte unsigned minimum: min(a, b)
vneg4(a,b) per-byte negation, with wrap-around: -a
vnegss4(a,b) per-byte negation, with signed saturation: sat.s8(-a)
vsads4(a,b) per-byte sum of abs difference of signed: sum{0,3}(|a-b|)
vsadu4(a,b) per-byte sum of abs difference of unsigned: sum{0,3}(|a-b|)
vseteq4(a,b) per-byte (un)signed comparison: a == b ? 1 : 0
vsetges4(a,b) per-byte signed comparison: a >= b ? 1 : 0
vsetgeu4(a,b) per-byte unsigned comparison: a >= b ? 1 : 0
vsetgts4(a,b) per-byte signed comparison: a > b ? 1 : 0
vsetgtu4(a,b) per-byte unsigned comparison: a > b ? 1 : 0
vsetles4(a,b) per-byte signed comparison: a <= b ? 1 : 0
vsetleu4(a,b) per-byte unsigned comparison: a <= b ? 1 : 0
vsetlts4(a,b) per-byte signed comparison: a < b ? 1 : 0
vsetltu4(a,b) per-byte unsigned comparison: a < b ? 1 : 0
vsetne4(a,b) per-byte (un)signed comparison: a != b ? 1: 0
vsub4(a,b) per-byte (un)signed subtraction, with wrap-around: a - b
vsubss4(a,b) per-byte subtraction with signed saturation: sat.s8 (a - b)
vsubus4(a,b) per-byte subtraction with unsigned saturation: sat.u8 (a - b)
*/

#2
Posted 03/19/2013 11:56 PM   
This recent paper provides an interesting example of significant performance improvements achieved with the help of Kepler's SIMD instructions: Yongchao Liu, Adrianto Wirawan, and Bertil Schmidt CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions BMC Bioinformatics 2013, 14:117 [url]http://www.biomedcentral.com/content/pdf/1471-2105-14-117.pdf[/url]
This recent paper provides an interesting example of significant performance improvements achieved with the help of Kepler's SIMD instructions:

Yongchao Liu, Adrianto Wirawan, and Bertil Schmidt
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions
BMC Bioinformatics 2013, 14:117
http://www.biomedcentral.com/content/pdf/1471-2105-14-117.pdf

#3
Posted 04/06/2013 10:40 PM   
The download page seems to be unavailable. I keep being redirected to the license page after clicking 'Agree'.
The download page seems to be unavailable. I keep being redirected to the license page after clicking 'Agree'.

#4
Posted 04/30/2013 05:34 AM   
I am unable to reproduce this download issue using my registered developer account. The problem may have been transient.
I am unable to reproduce this download issue using my registered developer account. The problem may have been transient.

#5
Posted 05/11/2013 11:12 PM   
I want this, I get Access Denied You don't have permission to access "http://developer.nvidia.com/user/register" on this server. Reference #18.44240ac3.1369092315.fda50f when I try to register!
I want this, I get

Access Denied
You don't have permission to access "http://developer.nvidia.com/user/register" on this server.
Reference #18.44240ac3.1369092315.fda50f

when I try to register!

#6
Posted 05/20/2013 11:26 PM   
There was an hardware failure , please try again it should be fixed now.
There was an hardware failure , please try again it should be fixed now.

#7
Posted 05/21/2013 12:21 AM   
I think not: Access Denied You don't have permission to access "http://developer.nvidia.com/user/register" on this server. Reference #18.44240ac3.1369095616.102081e
I think not:

Access Denied
You don't have permission to access "http://developer.nvidia.com/user/register" on this server.
Reference #18.44240ac3.1369095616.102081e

#8
Posted 05/21/2013 12:21 AM   
There seems to be a technical issue with the site. I cannot reach the CUDA registered developer website at this time. I will notify the relevant team.
There seems to be a technical issue with the site. I cannot reach the CUDA registered developer website at this time. I will notify the relevant team.

#9
Posted 05/21/2013 01:12 AM   
The technical problems appear to be fixed. At this point I am able to log into the registered developer website, and I successfully downloaded the file. Please try again.
The technical problems appear to be fixed. At this point I am able to log into the registered developer website, and I successfully downloaded the file. Please try again.

#10
Posted 05/21/2013 02:29 AM   
It's still broken: Access Denied You don't have permission to access "http://developer.nvidia.com/user/register" on this server. Reference #18.44240ac3.1369137378.1662e45
It's still broken:

Access Denied

You don't have permission to access "http://developer.nvidia.com/user/register" on this server.
Reference #18.44240ac3.1369137378.1662e45

#11
Posted 05/21/2013 11:57 AM   
Sorry to hear the site is still not accessible. Can you try this location: https://developer.nvidia.com/registered-developer-programs And follow the links to login or register for the CUDA Registered Developer Program. If you still experience an access problem - try a different browser and let me know the results. You are welcome to message be directly since I may need some additional information. Thanks again for your help getting to the bottom of this problem.
Sorry to hear the site is still not accessible.
Can you try this location:

https://developer.nvidia.com/registered-developer-programs


And follow the links to login or register for the CUDA Registered Developer Program.
If you still experience an access problem - try a different browser and let me know the results. You are welcome to message be directly since I may need some additional information.

Thanks again for your help getting to the bottom of this problem.

#12
Posted 05/21/2013 05:24 PM   
Tried it with Opera: Access Denied You don't have permission to access "http://developer.nvidia.com/user/register" on this server. Reference #18.69ff4317.1369183190.436fdf3
Tried it with Opera:

Access Denied
You don't have permission to access "http://developer.nvidia.com/user/register" on this server.

Reference #18.69ff4317.1369183190.436fdf3

#13
Posted 05/22/2013 12:42 AM   
Access Denied You don't have permission to access "http://developer.nvidia.com/user/register" on this server. Reference #18.17a9645f.1369721532.15525d0e
Access Denied
You don't have permission to access "http://developer.nvidia.com/user/register" on this server.
Reference #18.17a9645f.1369721532.15525d0e

#14
Posted 05/28/2013 06:13 AM   
@birdwes: Have you tried contacting Nadeem through a PM, as he suggested above?
@birdwes: Have you tried contacting Nadeem through a PM, as he suggested above?

#15
Posted 05/28/2013 05:29 PM   
  1 / 2    
Scroll To Top