Hello everyone. Help with the function of the FNV . I’m tired to fight it .
In its simplest form fnv function looks like this:
fnv4( x, y)
{
return x * 0x01000193 ^ y;
}
I am writing on PTX:
mov.u32 round,0x00;
$LLBfnv1:
ld.global.u64 %rM,[mixzero];
ld.global.u64 %rA,[mixzero+128];
mul.hi.u64 %rt0,%rtM,0x01000193;
shl.b64 %rt1,%rM, 32;
mul.hi.u64 %rt1,%rt1,0x01000193;
shl.b64 %rt0,%rt0, 32;
xor.b64 %rt0,%rt0,%rt1;
xor.b64 %rM,%rt0,%rA;
add.u32 round,round,1;
setp.lt.u32 p,round,64;
@p bra.uni $LLBfnv1;
I need a way to process 128 bytes in 64 rounds. With that, if i calculate the 16 threads in parallel by 2 bytes ,that result after each round to keep. Because %rM change depending on the round results.
If stored in a shared memory. Then it turns out that I can simultaneously run only 49152/128 = 384 threads. It is very small .
At the moment, I got GTX660 6 800 000 execution functions. If parallels are not just the function itself . A 128 bytes calculate sequentially in each thread.
Then we can get rid of conservation as the thread and so will see the results of 128 bytes .
To give you an example to understand purebasic why it is necessary to see the results after each round:
For i = 0 To 63
p=fnv(i ! ValueL(*s), ValueL(*mix+i % w) ) % (n /mixhashes) * mixhashes
fnv64BI(*mix,*Fullarray+p*#HASH_BYTES,*mix)
fnv64BI(*mix+#HASH_BYTES,*Fullarray+(p+1)*#HASH_BYTES,*mix+#HASH_BYTES)
Next i