Integer logarithm to the base 2 - how?
In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:
[code]
for(int i = 31; i >= 0; i--)
{
if(x >> i)
 return i;
}

return 0;
[/code]

The following code works [i]almost[/i] perfectly, but sometimes it just returns wrong value and program crashes:
[code]
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
[/code]

Is there any fast way to calculate integer logarithm in CUDA?
Thanks in advance
In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.



The following code works perfectly, but too slow:



for(int i = 31; i >= 0; i--)

{

if(x >> i)

 return i;

}



return 0;




The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:



return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));




Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

#1
Posted 05/14/2008 07:11 PM   
[quote name='Manticore' date='May 14 2008, 12:11 PM']In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:
[code]
for(int i = 31; i >= 0; i--)
{
if(x >> i)
 return i;
}

return 0;
[/code]

The following code works [i]almost[/i] perfectly, but sometimes it just returns wrong value and program crashes:
[code]
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
[/code]

Is there any fast way to calculate integer logarithm in CUDA?
Thanks in advance
[right][snapback]377102[/snapback][/right]
[/quote]

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits
starting at the most significant bit (i.e. bit 31) of integer parameter x.
[quote name='Manticore' date='May 14 2008, 12:11 PM']In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.



The following code works perfectly, but too slow:



for(int i = 31; i >= 0; i--)

{

if(x >> i)

 return i;

}



return 0;




The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:



return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));




Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

[snapback]377102[/snapback]






I think what you want is the function __clz(x)



See page 85 of beta 2.0 programming guide.



__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

#2
Posted 05/14/2008 07:22 PM   
[quote name='Ilhami Torunoglu' date='May 14 2008, 11:22 PM']I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits
starting at the most significant bit (i.e. bit 31) of integer parameter x.
[right][snapback]377113[/snapback][/right]
[/quote]

Great thanks! I must have missed it :">
[quote name='Ilhami Torunoglu' date='May 14 2008, 11:22 PM']I think what you want is the function __clz(x)



See page 85 of beta 2.0 programming guide.



__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

[snapback]377113[/snapback]






Great thanks! I must have missed it :">

#3
Posted 05/14/2008 08:56 PM   
From /usr/local/cuda/include/device_functions.h:

(can someone explain how this works?)

[code]
__device_func__(int __clz(int a))
{
 return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;
}
[/code]
From /usr/local/cuda/include/device_functions.h:



(can someone explain how this works?)





__device_func__(int __clz(int a))

{

 return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}

#4
Posted 05/14/2008 09:17 PM   
[quote name='Anjul Patney' date='May 14 2008, 04:17 PM']From /usr/local/cuda/include/device_functions.h:

(can someone explain how this works?)

[code]
__device_func__(int __clz(int a))
{
 return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;
}
[/code]
[right][snapback]377167[/snapback][/right]
[/quote]

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).
[quote name='Anjul Patney' date='May 14 2008, 04:17 PM']From /usr/local/cuda/include/device_functions.h:



(can someone explain how this works?)





__device_func__(int __clz(int a))

{

 return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}


[snapback]377167[/snapback]






Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).

#5
Posted 05/14/2008 09:51 PM   
[quote name='seibert' date='May 15 2008, 01:51 AM']Ah, this is what I was thinking you would do.  Floats store numbers in a base-2 mantissa/exponent format.  This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly.  Notice the minus sign flips it around.  __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one.  That's going to be 31-log2(a).
[right][snapback]377187[/snapback][/right]
[/quote]

Unfortunately, this function is not 100% precise too (
For example, I have a __device__ function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

[code]
for(int i = 31; i >=0; i--)
if(a & (1 << i))
{
 lores ^= (b << i);
 hires ^= (b >> (32 - i));
 a ^= (1 << i);  
}
// modulo operations
[/code]

All works perfectly. But when I want to speed this up and use __clz:

[code]
while(a)
{
i = 31 - __clz(a);
lores ^= (b << i);
hires ^= (b >> (32 - i));
a ^= (1 << i);  
}
[/code]

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.
[quote name='seibert' date='May 15 2008, 01:51 AM']Ah, this is what I was thinking you would do.  Floats store numbers in a base-2 mantissa/exponent format.  This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly.  Notice the minus sign flips it around.  __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one.  That's going to be 31-log2(a).

[snapback]377187[/snapback]






Unfortunately, this function is not 100% precise too (

For example, I have a __device__ function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:





for(int i = 31; i >=0; i--)

if(a & (1 << i))

{

 lores ^= (b << i);

 hires ^= (b >> (32 - i));

 a ^= (1 << i);  

}

// modulo operations




All works perfectly. But when I want to speed this up and use __clz:





while(a)

{

i = 31 - __clz(a);

lores ^= (b << i);

hires ^= (b >> (32 - i));

a ^= (1 << i);  

}




All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.

#6
Posted 05/17/2008 10:51 PM   
Scroll To Top