cuSparse CSR row pointer overflow

I’m working with cuSparse to transpose the results of an edge-detection algorithm that uses in-place cuFFT. Just to get an idea, the input array (image) to my first dense2csr call has about 1 - 2 percent “on” pixels, so I imagine they’re a good candidate for a sparse representation. The issue I am having is with populating the csrRowPtr array. My images are quite large, (as is necessary for the application - approx 2000 x 3000 pixels) therefore the integer type csrRowPtr array cannot hold these values without overflowing past index 32767. This is the host code that populates my CSR row pointers and CSR column indices:

//ip.h = #rows, ip.w = #cols, nnzTotal = #non-zero pixels (255)
        
        //column indices of each non-zero element (0-based index)
        int* csrColInd = new int[nnzTotal];
	
	//value array index of first non-zero element in row
	int* csrRowPtr = new int[ip.h + 1];

	int n = 0;
	bool set;
	for (x = 0; x < ip.h; x++)
	{
		set = false;
		for (y = 0; y < ip.w; y++)
		{
			if (f_image[x * ip.w + y] == 255)
			{
				if (!set) //set just first non-zero pixel found in row
				{
					csrRowPtr[x] = n;
					set = true;
				}
				csrColInd[n++] = y;  //set column index of non-zero pixel
			}
		}
	}
	csrRowPtr[ip.h] = csrRowPtr[0] + nnzTotal;

This causes the function “cusparseSdense2csr()” to throw error 6 (bad args, basically due to negative values from overflow). For very small images it is OK (but this defeats the purpose of the application). I tried using an unsigned integral type but the function will not accept.

Is there an overloaded version of the function I am missing?

I just had a thought as well, I am not initializing the csrRowPtr array before populating it. (there are rows in the image with NO “on” pixels. I will attempt to address this issue before pressing on this question.

Am I mistaken to not crop all null space from around the images?

integers don’t overflow at 32767.

signed integers (i.e. “int”) overflow at around 2 billion.

If your image is 2000x3000, and you are deriving a CSR sparse representation of it, you should have at most about 3000 row pointers in the csrRowPtr array. If your nnz value were 2000x3000 (the most it could be), then none of those row pointers would be higher than 2000x3000 = 6 million, much smaller than the range of an int.

You also haven’t shown the data you are feeding to dense2csr, i.e. the actual call to dense2csr that demonstrates the problem. I mention this because your code appears to be attempting synthesize the csrRowPtr and csrColInd arrays, but the dense2csr function does that for you.

Your intent is not clear.

I see. Yes I’ve convinced myself that overflow isn’t the issue. Then I was mistaken with thinking that I had to calculate/populate the arrays beforehand. The examples weren’t so clear in that regard. Thank you for pointing that out. I was thinking that there might have been an issue with representing columns or rows that are completely empty, in that some csrRowPtr elements couldn’t be defined. Removing the definition code from above got things moving along for now.

At this point I have decent results but ONLY with square images, or rather images that only contain “on” pixels in the top-left-most square region of the image. I have no idea why this might be, but there are injected artifacts that I cannot explain. This is the function I am using to read a sparsely populated edge-detected image and generate its transpose using CSC->CSR.

void cuSPARSE_T(float* f_image)
{
	cusparseStatus_t cs_status;
	cusparseHandle_t cs_handle;
	cusparseMatDescr_t descr;
	cusparseAction_t copyValues = CUSPARSE_ACTION_NUMERIC;
	cusparseIndexBase_t idxBase = CUSPARSE_INDEX_BASE_ZERO;
	cusparseDirection_t dir = CUSPARSE_DIRECTION_COLUMN;
	cudaError_t err;

	int nnzTotal;
	int* nnzPerRowCol;
	float* csrVal;
	int* csrRowPtr;
	int* csrColInd;
	float* cscVal;
	int* cscRowInd;
	int* cscColPtr;

	cs_status = cusparseCreate(&cs_handle);
	if (cs_status != CUSPARSE_STATUS_SUCCESS)
	{
		printf("\ncuSparse initilization failed: %d\n", cs_status);
		exit(1);
	}

	cs_status = cusparseCreateMatDescr(&descr);
	if (cs_status != CUSPARSE_STATUS_SUCCESS)
	{
		printf("\ncuSparse Mat Descr failed: %d\n", cs_status);
		exit(1);
	}
	cusparseSetMatType(descr, CUSPARSE_MATRIX_TYPE_GENERAL);
	cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO);

	float* d_f_image;
	err = cudaMalloc((void**)&d_f_image, sizeof(float) * ip.h * ip.w);
	if (err != CUDA_SUCCESS)
	{
		printf("\nd_f_image Alloc failed: %d", err);
		exit(1);
	}

	err = cudaMemcpy(d_f_image, f_image, sizeof(float) * ip.h * ip.w, cudaMemcpyHostToDevice);
	if (err != CUDA_SUCCESS)
	{
		printf("\nd_f_image memcopy (H2D) failed: %d", err);
		exit(1);
	}

	cudaMalloc((void**)&nnzPerRowCol, sizeof(int) * ip.w);

	cs_status = cusparseSnnz(cs_handle, dir, ip.w, ip.h, descr, d_f_image, ip.w, nnzPerRowCol, &nnzTotal);//whw
	if (cs_status != CUSPARSE_STATUS_SUCCESS)
	{
		printf("\ncuSparse NNZ failed: %d\n", cs_status);
		exit(1);
	}

	printf("\nnnzTotal       : %d", nnzTotal);
	printf("\nDensity        : %1.1f%%", 100 * (float)nnzTotal / (ip.w * ip.h));
	printf("\ncuSparse   NNZ ->       success");

	cudaMalloc((void**)&csrVal, sizeof(float) * nnzTotal);
	cudaMalloc((void**)&csrRowPtr, sizeof(int) * (ip.h + 1));
	cudaMalloc((void**)&csrColInd, sizeof(int) * nnzTotal);

	cudaMalloc((void**)&cscVal, sizeof(float) * nnzTotal);
	cudaMalloc((void**)&cscRowInd, sizeof(int) * nnzTotal);
	cudaMalloc((void**)&cscColPtr, sizeof(int) * (ip.w + 1));

	//convert to sparse CSC
	cs_status = cusparseSdense2csc(cs_handle, ip.w, ip.h, descr, d_f_image, ip.w, nnzPerRowCol, cscVal, cscRowInd, cscColPtr);//whw
	if (cs_status != CUSPARSE_STATUS_SUCCESS)
	{
		printf("\ncuSparse DENSE -> CSC failed: %d\n", cs_status);
		exit(1);
	}
	printf("\ncuSparse DENSE -> CSC   success");

	//implicit transpose by changing from CSR to CSC format
	cs_status = cusparseScsr2csc(cs_handle, ip.w, ip.h, nnzTotal, cscVal, cscColPtr, cscRowInd, csrVal, csrColInd, csrRowPtr, copyValues, idxBase);//wh
	if (cs_status != CUSPARSE_STATUS_SUCCESS)
	{
		printf("\ncuSparse   CSC -> CSR failed: %d\n", cs_status);
		exit(1);
	}
	printf("\ncuSparse   CSC -> CSR   success");

	//convert to dense matrix (image output)
	cs_status = cusparseScsr2dense(cs_handle, ip.h, ip.w, descr, cscVal, cscColPtr, cscRowInd, d_f_image, ip.h);//hwh
	if (cs_status != CUSPARSE_STATUS_SUCCESS)
	{
		printf("\ncuSparse   CSR -> DENSE failed: %d\n", cs_status);
		exit(1);
	}
	printf("\ncuSparse   CSR -> DENSE success");
	cudaDeviceSynchronize();
	cudaMemcpy(f_image, d_f_image, sizeof(float) * ip.h * ip.w, cudaMemcpyDeviceToHost);

	cudaFree(d_f_image);
	cudaFree(csrVal);
	cudaFree(csrRowPtr);
	cudaFree(csrColInd);
	cudaFree(cscVal);
	cudaFree(cscRowInd);
	cudaFree(cscColPtr);

	cusparseDestroyMatDescr(descr);
	cusparseDestroy(cs_handle);
}

…after looking at this one more time, I found it necessary to swap the lengths of csrRowPtr and cscColPtr on lines 65 and 70. This fixed the problem. Column-major indexing tripped me up many times through this.