In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:
[code]
for(int i = 31; i >= 0; i--)
{
if(x >> i)
Â return i;
}

return 0;
[/code]

The following code works [i]almost[/i] perfectly, but sometimes it just returns wrong value and program crashes:
[code]
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
[/code]

Is there any fast way to calculate integer logarithm in CUDA?
Thanks in advance

[quote name='Manticore' date='May 14 2008, 12:11 PM']In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:
[code]
for(int i = 31; i >= 0; i--)
{
if(x >> i)
return i;
}

return 0;
[/code]

The following code works [i]almost[/i] perfectly, but sometimes it just returns wrong value and program crashes:
[code]
return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));
[/code]

Is there any fast way to calculate integer logarithm in CUDA?
Thanks in advance
[right][snapback]377102[/snapback][/right]
[/quote]

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits
starting at the most significant bit (i.e. bit 31) of integer parameter x.

[quote name='Manticore' date='May 14 2008, 12:11 PM']In my program, I need to get the position of the highest bit of integer value - i.e. something like integer log2.

The following code works perfectly, but too slow:

for(int i = 31; i >= 0; i--)

{

if(x >> i)

return i;

}

return 0;

The following code works almost perfectly, but sometimes it just returns wrong value and program crashes:

[quote name='Ilhami Torunoglu' date='May 14 2008, 11:22 PM']I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits
starting at the most significant bit (i.e. bit 31) of integer parameter x.
[right][snapback]377113[/snapback][/right]
[/quote]

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).

Â return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}

[snapback]377167[/snapback]

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).

[quote name='seibert' date='May 15 2008, 01:51 AM']Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).
[right][snapback]377187[/snapback][/right]
[/quote]

Unfortunately, this function is not 100% precise too (
For example, I have a __device__ function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.

[quote name='seibert' date='May 15 2008, 01:51 AM']Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).

[snapback]377187[/snapback]

Unfortunately, this function is not 100% precise too (

For example, I have a __device__ function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

for(int i = 31; i >=0; i--)

if(a & (1 << i))

{

lores ^= (b << i);

hires ^= (b >> (32 - i));

a ^= (1 << i);

}

// modulo operations

All works perfectly. But when I want to speed this up and use __clz:

while(a)

{

i = 31 - __clz(a);

lores ^= (b << i);

hires ^= (b >> (32 - i));

a ^= (1 << i);

}

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.

The following code works perfectly, but too slow:

[code]

for(int i = 31; i >= 0; i--)

{

if(x >> i)

Â return i;

}

return 0;

[/code]

The following code works [i]almost[/i] perfectly, but sometimes it just returns wrong value and program crashes:

[code]

return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));

[/code]

Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

The following code works perfectly, but too slow:

The following code works

almostperfectly, but sometimes it just returns wrong value and program crashes:Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

The following code works perfectly, but too slow:

[code]

for(int i = 31; i >= 0; i--)

{

if(x >> i)

return i;

}

return 0;

[/code]

The following code works [i]almost[/i] perfectly, but sometimes it just returns wrong value and program crashes:

[code]

return (uint)(__log2f((x | (x >> 1) | (x >> 2)) - (x >> 1)));

[/code]

Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

[right][snapback]377102[/snapback][/right]

[/quote]

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

The following code works perfectly, but too slow:

The following code works

almostperfectly, but sometimes it just returns wrong value and program crashes:Is there any fast way to calculate integer logarithm in CUDA?

Thanks in advance

I think what you want is the function __clz(x)

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

[right][snapback]377113[/snapback][/right]

[/quote]

Great thanks! I must have missed it :">

See page 85 of beta 2.0 programming guide.

__clz(x) returns the number, between 0 and 32 inclusive, of consecutive zero bits

starting at the most significant bit (i.e. bit 31) of integer parameter x.

Great thanks! I must have missed it :">

(can someone explain how this works?)

[code]

__device_func__(int __clz(int a))

{

return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}

[/code]

(can someone explain how this works?)

(can someone explain how this works?)

[code]

__device_func__(int __clz(int a))

{

Â return (a)?(158-(__float_as_int(__uint2float_rz((unsigned int)a))>>23)):32;

}

[/code]

[right][snapback]377167[/snapback][/right]

[/quote]

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).

(can someone explain how this works?)

Ah, this is what I was thinking you would do. Floats store numbers in a base-2 mantissa/exponent format. This line is converting the integer to a float, then pulling the exponent bits out of the float representation and manipulating them directly. Notice the minus sign flips it around. __clz() doesn't directly give you log base 2, but counts how many zeros are on the most significant bit side of the one. That's going to be 31-log2(a).

[right][snapback]377187[/snapback][/right]

[/quote]

Unfortunately, this function is not 100% precise too (

For example, I have a __device__ function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

[code]

for(int i = 31; i >=0; i--)

if(a & (1 << i))

{

lores ^= (b << i);

hires ^= (b >> (32 - i));

a ^= (1 << i);

}

// modulo operations

[/code]

All works perfectly. But when I want to speed this up and use __clz:

[code]

while(a)

{

i = 31 - __clz(a);

lores ^= (b << i);

hires ^= (b >> (32 - i));

a ^= (1 << i);

}

[/code]

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.

Unfortunately, this function is not 100% precise too (

For example, I have a __device__ function, that multiplies two binary polynomials a and b, that are stored in unsigned ints:

All works perfectly. But when I want to speed this up and use __clz:

All other code remains the same, and I get significant performance boost - but sometimes I get ULF (seems to be because of wrong __clz() return value and, therefore, infinite cycle). This occures on different data sets and on different places each run for one data set.