problem with tex1Dfetch on Fermi architecture
Hi all,

my problem should have a simple answer but I am stuck finding it. The following simple code, where
I used 1d texture memory to store some data and tried to fetch it using the tex1Dfetch, fails
(the code returns zeros) on Fermi architecture, both GTX480 and Tesla 2050, and functions
okay (returning 0..9) on Tesla 1060, and I do not see at all why this should happen.

[code]
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <cutil.h>

cudaArray* testArray;
texture<float, 1, cudaReadModeElementType> texTest;


__global__ void testKernel(float *_arr)
{
_arr[threadIdx.x] = tex1Dfetch(texTest, threadIdx.x);
}


int main( int narg, char *args[])
{
cudaSetDevice(0);

int nData = 10;
float *h_testData = (float*)malloc(nData*sizeof(float));
for (int i=0; i<nData; i++)
h_testData[i] = (float)i;

float *d_testData;
cudaMalloc((void **) &d_testData, nData*sizeof(float));

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
CUDA_SAFE_CALL( cudaMallocArray( &testArray, &channelDesc, nData, 1 ));
CUDA_SAFE_CALL( cudaMemcpyToArray( testArray, 0, 0, h_testData, nData*sizeof(float), cudaMemcpyHostToDevice));

texTest.addressMode[0] = cudaAddressModeClamp;
texTest.addressMode[1] = cudaAddressModeClamp;
texTest.filterMode = cudaFilterModePoint;
texTest.normalized = false;

CUDA_SAFE_CALL( cudaBindTextureToArray( texTest, testArray, channelDesc));

testKernel <<< 1, nData >>> (d_testData);
CUDA_SAFE_CALL( cudaThreadSynchronize() );

CUDA_SAFE_CALL( cudaMemcpy(h_testData, d_testData, nData*sizeof(float), cudaMemcpyDeviceToHost) );
for (int i=0; i<nData; i++)
printf(" testData[%i]=%f \n", i, h_testData[i]);

}
[/code]

I would really appreciate it if somebody had an idea why this happens...
Hi all,



my problem should have a simple answer but I am stuck finding it. The following simple code, where

I used 1d texture memory to store some data and tried to fetch it using the tex1Dfetch, fails

(the code returns zeros) on Fermi architecture, both GTX480 and Tesla 2050, and functions

okay (returning 0..9) on Tesla 1060, and I do not see at all why this should happen.





#include <stdio.h>

#include <stdlib.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cutil.h>



cudaArray* testArray;

texture<float, 1, cudaReadModeElementType> texTest;





__global__ void testKernel(float *_arr)

{

_arr[threadIdx.x] = tex1Dfetch(texTest, threadIdx.x);

}





int main( int narg, char *args[])

{

cudaSetDevice(0);



int nData = 10;

float *h_testData = (float*)malloc(nData*sizeof(float));

for (int i=0; i<nData; i++)

h_testData[i] = (float)i;



float *d_testData;

cudaMalloc((void **) &d_testData, nData*sizeof(float));



cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);

CUDA_SAFE_CALL( cudaMallocArray( &testArray, &channelDesc, nData, 1 ));

CUDA_SAFE_CALL( cudaMemcpyToArray( testArray, 0, 0, h_testData, nData*sizeof(float), cudaMemcpyHostToDevice));



texTest.addressMode[0] = cudaAddressModeClamp;

texTest.addressMode[1] = cudaAddressModeClamp;

texTest.filterMode = cudaFilterModePoint;

texTest.normalized = false;



CUDA_SAFE_CALL( cudaBindTextureToArray( texTest, testArray, channelDesc));



testKernel <<< 1, nData >>> (d_testData);

CUDA_SAFE_CALL( cudaThreadSynchronize() );



CUDA_SAFE_CALL( cudaMemcpy(h_testData, d_testData, nData*sizeof(float), cudaMemcpyDeviceToHost) );

for (int i=0; i<nData; i++)

printf(" testData[%i]=%f \n", i, h_testData[i]);



}




I would really appreciate it if somebody had an idea why this happens...

#1
Posted 02/09/2011 12:59 PM   
Forgot to mention that if I use tex1D instead of tex1Dfetch the code works just fine.
However, I would really like to understand why tex1Dfetch malfunctions here as I would
expect tex1Dfetch to be faster than tex1D so that I would use it in my code if I fix the problem
with your help :)
Forgot to mention that if I use tex1D instead of tex1Dfetch the code works just fine.

However, I would really like to understand why tex1Dfetch malfunctions here as I would

expect tex1Dfetch to be faster than tex1D so that I would use it in my code if I fix the problem

with your help :)

#2
Posted 02/09/2011 01:04 PM   
[quote name='Rubd' date='09 February 2011 - 02:04 PM' timestamp='1297256647' post='1191140']
Forgot to mention that if I use tex1D instead of tex1Dfetch the code works just fine.
However, I would really like to understand why tex1Dfetch malfunctions here as I would
expect tex1Dfetch to be faster than tex1D so that I would use it in my code if I fix the problem
with your help :)
[/quote]

I seem to remember that tex1Dfetch is for plain memory, tex1D is for cuda arrays.
but I may be wrong.
[quote name='Rubd' date='09 February 2011 - 02:04 PM' timestamp='1297256647' post='1191140']

Forgot to mention that if I use tex1D instead of tex1Dfetch the code works just fine.

However, I would really like to understand why tex1Dfetch malfunctions here as I would

expect tex1Dfetch to be faster than tex1D so that I would use it in my code if I fix the problem

with your help :)





I seem to remember that tex1Dfetch is for plain memory, tex1D is for cuda arrays.

but I may be wrong.

#3
Posted 02/09/2011 01:27 PM   
This is expected, tex1Dfetch is for fetching from linear memory, tex1D reads from CUDA arrays.
This is expected, tex1Dfetch is for fetching from linear memory, tex1D reads from CUDA arrays.

#4
Posted 02/09/2011 01:28 PM   
Thanks, guys! this seems to be the right answer
Thanks, guys! this seems to be the right answer

#5
Posted 02/09/2011 01:57 PM   
I have the same problem with my cuda code.

I used tex1Dfetch to fetch the data and it worked fine on my old GPUs (GF 98ooGT).
But when I ran the code on a new GPU (Tesla C2070) in Windows7 64bits, it fails (all values are zero).
At beginning, I guess it may be caused by 64bits OS or CUDA's version or GPU drivers.
Then I updated all, but it still failed.

I struggled for a whole day until I found this post.
Yes, It is caused by tex1Dfetch and tex1D.
I have the same problem with my cuda code.



I used tex1Dfetch to fetch the data and it worked fine on my old GPUs (GF 98ooGT).

But when I ran the code on a new GPU (Tesla C2070) in Windows7 64bits, it fails (all values are zero).

At beginning, I guess it may be caused by 64bits OS or CUDA's version or GPU drivers.

Then I updated all, but it still failed.



I struggled for a whole day until I found this post.

Yes, It is caused by tex1Dfetch and tex1D.

#6
Posted 05/02/2012 09:17 AM   
Scroll To Top