New download: SIMD-in-a-word functions
The following contents was posted to the registered developer website on January 31st, 2013. (new: [url]https://developer.nvidia.com/user/register[/url]; old: [url]https://partners.nvidia.com[/url]): SIMD-in-a-word functions simd_functions.h contains a collection of inline functions for processing byte and half-word data packed into 32-bit words. The functions are hardware accelerated on Kepler platforms. Efficient emulation code is provided for earlier platform so the functions are portable across all compute capabilities. The functionality provided should be useful for image processing tasks and other application areas. The list of supported functions is as follows: [code] vabsdiffu2(a,b) per-halfword unsigned absolute difference: |a - b| vadd2(a,b) per-halfword (un)signed addition, with wrap-around: a + b vavgu2(a,b) per-halfword unsigned rounded average: (a + b + 1) / 2 vcmpeq2(a,b) per-halfword (un)signed comparison: a == b ? 0xffff : 0 vcmpgeu2(a,b) per-halfword unsigned comparison: a >= b ? 0xffff : 0 vcmpgtu2(a,b) per-halfword unsigned comparison: a > b ? 0xffff : 0 vcmpleu2(a,b) per-halfword unsigned comparison: a <= b ? 0xffff : 0 vcmpltu2(a,b) per-halfword unsigned comparison: a < b ? 0xffff : 0 vcmpne2(a,b) per-halfword (un)signed comparison: a != b ? 0xffff : 0 vhaddu2(a,b) per-halfword unsigned average: (a + b) / 2 vmaxu2(a,b) per-halfword unsigned maximum: max(a, b) vminu2(a,b) per-halfword unsigned minimum: min(a, b) vseteq2(a,b) per-halfword (un)signed comparison: a == b ? 1 : 0 vsetgeu2(a,b) per-halfword unsigned comparison: a >= b ? 1 : 0 vsetgtu2(a,b) per-halfword unsigned comparison: a > b ? 1 : 0 vsetleu2(a,b) per-halfword unsigned comparison: a <= b ? 1 : 0 vsetltu2(a,b) per-halfword unsigned comparison: a < b ? 1 : 0 vsetne2(a,b) per-halfword (un)signed comparison: a != b ? 1 : 0 vsub2(a,b) per-halfword (un)signed subtraction, with wrap-around: a - b vabsdiffu4(a,b) per-byte unsigned absolute difference: |a - b| vadd4(a,b) per-byte (un)signed addition, with wrap-around: a + b vavgu4(a,b) per-byte unsigned rounded average: (a + b + 1) / 2 vcmpeq4(a,b) per-byte (un)signed comparison: a == b ? 0xff : 0 vcmpgeu4(a,b) per-byte unsigned comparison: a >= b ? 0xff : 0 vcmpgtu4(a,b) per-byte unsigned comparison: a > b ? 0xff : 0 vcmpleu4(a,b) per-byte unsigned comparison: a <= b ? 0xff : 0 vcmpltu4(a,b) per-byte unsigned comparison: a < b ? 0xff : 0 vcmpne4(a,b) per-byte (un)signed comparison: a != b ? 0xff: 0 vhaddu4(a,b) per-byte unsigned average: (a + b) / 2 vmaxu4(a,b) per-byte unsigned maximum: max(a, b) vminu4(a,b) per-byte unsigned minimum: min(a, b) vseteq4(a,b) per-byte (un)signed comparison: a == b ? 1 : 0 vsetgeu4(a,b) per-byte unsigned comparison: a >= b ? 1 : 0 vsetgtu4(a,b) per-byte unsigned comparison: a > b ? 1 : 0 vsetleu4(a,b) per-byte unsigned comparison: a <= b ? 1 : 0 vsetltu4(a,b) per-byte unsigned comparison: a < b ? 1 : 0 vsetne4(a,b) per-byte (un)signed comparison: a != b ? 1: 0 vsub4(a,b) per-byte (un)signed subtraction, with wrap-around: a - b [/code]
The following contents was posted to the registered developer website on January 31st, 2013.
(new: https://developer.nvidia.com/user/register; old: https://partners.nvidia.com):

SIMD-in-a-word functions

simd_functions.h contains a collection of inline functions for processing byte and half-word data packed into 32-bit words. The functions are hardware accelerated on Kepler platforms. Efficient emulation code is provided for earlier platform so the functions are portable across all compute capabilities. The functionality provided should be useful for image processing tasks and other application areas.

The list of supported functions is as follows:

vabsdiffu2(a,b) per-halfword unsigned absolute difference: |a - b|
vadd2(a,b) per-halfword (un)signed addition, with wrap-around: a + b
vavgu2(a,b) per-halfword unsigned rounded average: (a + b + 1) / 2
vcmpeq2(a,b) per-halfword (un)signed comparison: a == b ? 0xffff : 0
vcmpgeu2(a,b) per-halfword unsigned comparison: a >= b ? 0xffff : 0
vcmpgtu2(a,b) per-halfword unsigned comparison: a > b ? 0xffff : 0
vcmpleu2(a,b) per-halfword unsigned comparison: a <= b ? 0xffff : 0
vcmpltu2(a,b) per-halfword unsigned comparison: a < b ? 0xffff : 0
vcmpne2(a,b) per-halfword (un)signed comparison: a != b ? 0xffff : 0
vhaddu2(a,b) per-halfword unsigned average: (a + b) / 2
vmaxu2(a,b) per-halfword unsigned maximum: max(a, b)
vminu2(a,b) per-halfword unsigned minimum: min(a, b)
vseteq2(a,b) per-halfword (un)signed comparison: a == b ? 1 : 0
vsetgeu2(a,b) per-halfword unsigned comparison: a >= b ? 1 : 0
vsetgtu2(a,b) per-halfword unsigned comparison: a > b ? 1 : 0
vsetleu2(a,b) per-halfword unsigned comparison: a <= b ? 1 : 0
vsetltu2(a,b) per-halfword unsigned comparison: a < b ? 1 : 0
vsetne2(a,b) per-halfword (un)signed comparison: a != b ? 1 : 0
vsub2(a,b) per-halfword (un)signed subtraction, with wrap-around: a - b

vabsdiffu4(a,b) per-byte unsigned absolute difference: |a - b|
vadd4(a,b) per-byte (un)signed addition, with wrap-around: a + b
vavgu4(a,b) per-byte unsigned rounded average: (a + b + 1) / 2
vcmpeq4(a,b) per-byte (un)signed comparison: a == b ? 0xff : 0
vcmpgeu4(a,b) per-byte unsigned comparison: a >= b ? 0xff : 0
vcmpgtu4(a,b) per-byte unsigned comparison: a > b ? 0xff : 0
vcmpleu4(a,b) per-byte unsigned comparison: a <= b ? 0xff : 0
vcmpltu4(a,b) per-byte unsigned comparison: a < b ? 0xff : 0
vcmpne4(a,b) per-byte (un)signed comparison: a != b ? 0xff: 0
vhaddu4(a,b) per-byte unsigned average: (a + b) / 2
vmaxu4(a,b) per-byte unsigned maximum: max(a, b)
vminu4(a,b) per-byte unsigned minimum: min(a, b)
vseteq4(a,b) per-byte (un)signed comparison: a == b ? 1 : 0
vsetgeu4(a,b) per-byte unsigned comparison: a >= b ? 1 : 0
vsetgtu4(a,b) per-byte unsigned comparison: a > b ? 1 : 0
vsetleu4(a,b) per-byte unsigned comparison: a <= b ? 1 : 0
vsetltu4(a,b) per-byte unsigned comparison: a < b ? 1 : 0
vsetne4(a,b) per-byte (un)signed comparison: a != b ? 1: 0
vsub4(a,b) per-byte (un)signed subtraction, with wrap-around: a - b

#1
Posted 01/31/2013 01:56 AM   
Great stuff!
Great stuff!

#2
Posted 02/04/2013 07:25 PM   
Where exactly in the developer website is this posted? I am a registered developer but cannot find this under developer.nvidia.com. Thanks
Where exactly in the developer website is this posted?
I am a registered developer but cannot find this under developer.nvidia.com.
Thanks

#3
Posted 02/07/2013 09:24 PM   
(1) Log in at https://developer.nvidia.com/user (2) Click green link "CUDA/GPU Computing Registered Developer Program" (3) Click green link "Download" after "CUDA SIMD-within-a-word functions" (4) Click green link "simd_functions.tar" (5) Confirm legal notice by clicking "Agree & Download" button at the bottom of the notice The download should start automatically at that point.
(1) Log in at https://developer.nvidia.com/user

(2) Click green link "CUDA/GPU Computing Registered Developer Program"
(3) Click green link "Download" after "CUDA SIMD-within-a-word functions"
(4) Click green link "simd_functions.tar"
(5) Confirm legal notice by clicking "Agree & Download" button at the bottom of the notice

The download should start automatically at that point.

#4
Posted 02/10/2013 05:48 AM   
Looks like the download page for this file is borked.. infinite loop when you agree to the license terms. I tried with Chrome in both OSX and Linux.
Looks like the download page for this file is borked.. infinite loop when you agree to the license terms.
I tried with Chrome in both OSX and Linux.

#5
Posted 02/13/2013 12:35 AM   
Sorry for the inconvenience. I do not have Chrome here to repro (just IE and Firefox). I will notify the relevant team so they can look into this. Thanks for alerting us to this issue.
Sorry for the inconvenience. I do not have Chrome here to repro (just IE and Firefox). I will notify the relevant team so they can look into this. Thanks for alerting us to this issue.

#6
Posted 02/13/2013 12:40 AM   
We have been unable to reproduce the problem with operating the download page in Chrome. Maybe the problem is tied to a specific version of Chrome or specific security settings? You might want to file a bug with details of the exact Chrome configuration. It would also be of interest to hear whether other Chrome users encountered issues with the download page.
We have been unable to reproduce the problem with operating the download page in Chrome. Maybe the problem is tied to a specific version of Chrome or specific security settings? You might want to file a bug with details of the exact Chrome configuration. It would also be of interest to hear whether other Chrome users encountered issues with the download page.

#7
Posted 02/13/2013 11:05 PM   
Same here, but different browser: tried IE and Firefox. Has it been fixed?
Same here, but different browser: tried IE and Firefox. Has it been fixed?

#8
Posted 04/30/2013 05:35 AM   
Scroll To Top