Cuda in-built function for Float to Int8 conversion ?

Hello,

Is there a function available in CUDA which can convert my fp32 device arguments into int8_t or signed char device arguements ?
For example:

int8_t/signed char ret_val = fun(100.0); should return me ret_val as 100

Thanks for the support !

Have you tried the regular old C++ way, with an appropriate cast?

I tried doing it with reinterpret_cast but the answer I am getting is all zeros.

#include<stdio.h>
#define N 10

global void add( float *a, float *b, float *c ) {
int tid = blockIdx.x; // handle the data at this index
if (tid < N)
c[tid] = a[tid] + b[tid];
}

global void printKer( int8_t *input ) {

int i = blockIdx.x;
if(i<N)
{
printf(“Value: %d\n”, input[i]);
}
}

int main( void ) {

float a[N], b[N], c[N];
int8_t c_c[N];
float dev_a, dev_b, dev_c;
int8_t dev_c_int;
// allocate the memory on the GPU
( cudaMalloc( (void
)&dev_a, N * sizeof(float) ) );
( cudaMalloc( (void
)&dev_b, N * sizeof(float) ) );
( cudaMalloc( (void**)&dev_c, N * sizeof(float) ) );
// fill the arrays ‘a’ and ‘b’ on the CPU
( cudaMalloc( (void**)&dev_c_int, N * sizeof(int8_t) ) );

for (int i=0; i<N; i++) {
a[i] = -i;
b[i] = i * i;
}
// copy the arrays ‘a’ and ‘b’ to the GPU
( cudaMemcpy( dev_a, a, N * sizeof(float),
cudaMemcpyHostToDevice ) );
( cudaMemcpy( dev_b, b, N * sizeof(float),
cudaMemcpyHostToDevice ) );
add<<<N,1>>>( dev_a, dev_b, dev_c );
// copy the array ‘c’ back from the GPU to the CPU
//dev_c_int = (int8_t*) dev_c;
dev_c_int =reinterpret_cast<int8_t*>(dev_c);

printKer<<<N,1>>>(dev_c_int);

( cudaMemcpy( c_c, dev_c_int, N * sizeof(int8_t),
cudaMemcpyDeviceToHost ) );

// display the results
for (int i=0; i<N; i++) {
printf( “%f + %f = %d\n”, a[i], b[i], c_c[i] );
}

// free the memory allocated on the GPU
cudaFree( dev_a );
cudaFree( dev_b );
cudaFree( dev_c );
cudaFree( dev_c_int );
return 0;
}

Output:

Value: 0
Value: 0
Value: 0
Value: 0
Value: 0
Value: 0
Value: 0
Value: 0
Value: 0
Value: 0
0.000000 + 0.000000 = 0
-1.000000 + 1.000000 = 0
-2.000000 + 4.000000 = 0
-3.000000 + 9.000000 = 0
-4.000000 + 16.000000 = 0
-5.000000 + 25.000000 = 0
-6.000000 + 36.000000 = 0
-7.000000 + 49.000000 = 0
-8.000000 + 64.000000 = 0
-9.000000 + 81.000000 = 0

How about a plain old cast? C style:

int8_t foo = (int8_t)100.0f;
printf ("%d\n", foo);

or C++ style:

int8_t foo = int8_t (100.0f);
printf ("%d\n", foo);

That definitely works!
But my ‘dev_c_int’ and ‘dev_c_int’ are pointers and I don’t want to write a kernel to individually copy each and every element of that memory pointed by these pointers by simply doing cast.

That’s why I tried reinterpret_cast.

The way to convert a large array in bulk on the GPU is to assign a few elements (possibly just one) to each CUDA thread. From an efficiency standpoint, it would probably be best to represent your data as char8, i.e. each element comprises eight byte-sized integers, and have each thread convert to one of those.

You might want to study up on C++ in general, there seems to be confusion as to what reinterpret_cast does.