From: gjw
Sent: Friday, September 23, 2016 3:12 PM
To: jbungo
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issue
I should have inspected the code more carefully, but this one is so obvious that I feel like I must be the first one to actually run it. In the documentation on the NVidia Developer website AND in the latest repository on bitbucket, the wbCheck function states
if (err == cudaSuccess)….
which of course terminates the program when the cuda operation actually was successful. That is why bypassing the wbCheck function lets the program proceed…
However, I still get a result array of all 0s. The odd thing is that setting the cells to some random value in the sgemm kernel function, as in
C[row * numBColumns + col] = 4.5;
STILL results on all 0s!
From: jbungo
Date: Friday, September 23, 2016 at 1:55 PM
To: gjw
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issue
Still thinking it’s an allocation issue. WbCheck is defined as
#define wbCheck(stmt)
do {
cudaError_t err = stmt;
if (err != cudaSuccess) {
wbLog(ERROR, "Failed to run stmt ", #stmt);
wbLog(ERROR, "Got CUDA error … ", cudaGetErrorString(err));
return -1;
}
} while (0)
Could you be checking error incorrectly?
From: gjw
Sent: Friday, September 23, 2016 10:55 AM
To: jbungo
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issue
No I tested this. As I said, when I bypass wbCheck and call cudaMalloc directly and check the error code returned, it reports “no error”. All the sample programs that come with the CUDA Toolkit also run correctly.
From: jbungo
Date: Friday, September 23, 2016 at 10:47 AM
To: gjw
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issue
As this is a runtime error, what hardware are you using? May not have enough memory to allocate the matrix.
From: gjw
Sent: Thursday, September 22, 2016 3:06 PM
To: jbungo
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issue
C:\Users\user\Documents\cuda\build\Debug\MatrixMultiplication\Dataset\0>…....\BasicMatrixMultiplication_Solution.exe -e output.raw -i input0.raw,input1.raw
{“data”: {“elapsed_time”: 3949848, “end_file”: “C:/Users/user/Documents/cuda/gputeachingkit-labs/Module4/BasicMatrixMultiplication/solution.cu”, “end_function”: “main”, “end_line”: 52, “end_time”: 4947245923438, “id”: “a18658cd-0ff0-477b-8083-1fd61ef4f768”, “idx”: 0, “kind”: “Generic”, “message”: “Importing data and creating memory on host”, “mpi_rank”: 0, “parent_id”: -1, “session_id”: “session_id_disabled”, “start_file”: “C:/Users/user/Documents/cuda/gputeachingkit-labs/Module4/BasicMatrixMultiplication/solution.cu”, “start_function”: “main”, “start_line”: 45, “start_time”: 4947241973590, “stopped”: true}, “id”: “a18658cd-0ff0-477b-8083-1fd61ef4f768”, “session_id”: “session_id_disabled”, “type”: “timer”}
{“data”: {“file”: “C:/Users/user/Documents/cuda/gputeachingkit-labs/Module4/BasicMatrixMultiplication/solution.cu”, “function”: “main”, “id”: “aef41e70-a007-402b-89b6-bf4093a5cbfe”, “level”: “Trace”, “line”: 57, “message”: “The dimensions of A are 16 x 16”, “mpi_rank”: 0, “session_id”: “session_id_disabled”, “time”: 4947352581734}, “id”: “aef41e70-a007-402b-89b6-bf4093a5cbfe”, “session_id”: “session_id_disabled”, “type”: “logger”}
{“data”: {“file”: “C:/Users/user/Documents/cuda/gputeachingkit-labs/Module4/BasicMatrixMultiplication/solution.cu”, “function”: “main”, “id”: “54a2c75a-1882-4525-ae7f-c8b097026116”, “level”: “Trace”, “line”: 58, “message”: “The dimensions of B are 16 x 16”, “mpi_rank”: 0, “session_id”: “session_id_disabled”, “time”: 4947391317118}, “id”: “54a2c75a-1882-4525-ae7f-c8b097026116”, “session_id”: “session_id_disabled”, “type”: “logger”}
{“data”: {“file”: “C:/Users/user/Documents/cuda/gputeachingkit-labs/Module4/BasicMatrixMultiplication/solution.cu”, “function”: “main”, “id”: “312d37d1-7a63-4b83-9c7c-af804ca1d44c”, “level”: “Trace”, “line”: 59, “message”: “The dimensions of C are 16 x 16”, “mpi_rank”: 0, “session_id”: “session_id_disabled”, “time”: 4947425279781}, “id”: “312d37d1-7a63-4b83-9c7c-af804ca1d44c”, “session_id”: “session_id_disabled”, “type”: “logger”}
{“data”: {“file”: “C:/Users/user/Documents/cuda/gputeachingkit-labs/Module4/BasicMatrixMultiplication/solution.cu”, “function”: “main”, “id”: “aefd05dd-f7c3-4647-984d-216d810935dc”, “level”: “Error”, “line”: 65, “message”: “Failed to run stmt cudaMalloc((void **)&deviceA, numAColumns * sizeof(float))”, “mpi_rank”: 0, “session_id”: “session_id_disabled”, “time”: 4947539451693}, “id”: “aefd05dd-f7c3-4647-984d-216d810935dc”, “session_id”: “session_id_disabled”, “type”: “logger”}
The cuda Malloc call in line 65 is passed to wbCheck(). When I called it directly
cudaError_t err = cudaMalloc((void **)&deviceA, numARows * numAColumns*sizeof(float));
cudaGetErrorString(err) reports “no error” . When I remove all the wbCheck references, the program terminates but fails the wbSolution call
{“data”: {“file”: “C:\Users\user\Documents\cuda\gputeachingkit-labs\libwb\wbSolution.cpp”, “function”: “wbSolution”, “id”: “97610309-d69f-4031-906c-3ad84fb334f0”, “level”: “Error”, “line”: 145, “message”: “Failed to grade solution”, “mpi_rank”: 0, “session_id”: “session_id_disabled”, “time”: 4872522873799}, “id”: “97610309-d69f-4031-906c-3ad84fb334f0”, “session_id”: “session_id_disabled”, “type”: “logger”}
{“data”: {“correctq”: false, “message”: “”}, “type”: “solution”}
and indeed the entire hostC array is nothing but zeroes.
From: jbungo
Date: Thursday, September 22, 2016 at 8:27 AM
To: gjw
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issue
Can you provide the failing messages?
From: gjw
Sent: Wednesday, September 21, 2016 10:47 PM
To: jbungo
Subject: Re: GPU Teaching Kit - Accelerated Computing Labs: compile issues
I may have spoken too soon. The files compile now (with the exception of 1), but they fail at runtime anytime they use a kernel function or access device memory.