I figured out what the problem is.
The problem is loss of performance when indexing. Please see the program code… You can see that as soon as the indexing of GPU performance drops.
View if I just multiply the matrix of the speed of the GPU is 3 times higher (Elapsed time is 0.027648 seconds for CPU and Elapsed time is 0.011477 seconds for GPU).
But as soon as the indexing that GPU performance is 50 times less than that of CPU (Elapsed time is 0.002495 seconds for CPU and Elapsed time is 0.127313 seconds for GPU).
And the smaller the indexes, the problem is reduced. So the GPU doesn’t like indexing. Why is this happening?
M=rand(1000,500,‘double’);
N=rand(1000,500,‘double’);
tic
for i=1:500
M(:,i)=N(:,501-i).*N(:,i);
end
toc
%--------------
gpu=gpuDevice();
V=rand(1000,500,‘gpuArray’);
Y=rand(1000,500,‘gpuArray’);
wait(gpu)
tic
for i=1:500
Y(:,i)=V(:,501-i).*V(:,i);
end
wait(gpu)
toc
wait(gpu)
tic
for i=1:500
C=V(:,501-i).*V(:,i);
end
wait(gpu)
toc
A=V(:,1);
B=V(:,2);
wait(gpu)
tic
for i=1:500
D=A.*B;
end
wait(gpu)
toc
%--------------
tic
E=M’*N;
toc
wait(gpu)
tic
F=V’*Y;
wait(gpu)
toc
Elapsed time is 0.002495 seconds.
Elapsed time is 0.127313 seconds.
Elapsed time is 0.068272 seconds.
Elapsed time is 0.009520 seconds.
Elapsed time is 0.027648 seconds.
Elapsed time is 0.011477 seconds.
And how to solve the problem. The code was given for example. My code where I see the problem like this:
Y(:,26)=V1.*V2.*V12;
X=V1.*V3;
Y(:,27)=X.*V7;
Y(:,28)=X.*V8;
X=V1.*V6;
Y(:,29)=X.*V7;
Y(:,30)=X.*V8;
Y(:,31)=V1.*V7.*V11;
Y(:,32)=X8.*V1;
Y(:,33)=V1.*V11.*V12;
Y(:,34)=X2.*V7;
Y(:,35)=X2.*V9;
Y(:,36)=X2.*V11;
Y(:,37)=X2.*V12;
X=V2.*V3;
Y(:,38)=X.*V7;
Y(:,39)=X.*V12;
Y(:,40)=X.*V13;
X=V2.*V4;
Y(:,41)=X.*V7;
Y(:,42)=X.*V8;
X=V2.*V6;
Y(:,43)=X.*V8;
Y(:,44)=X.*V12;
X=V2.*V7;
Y(:,45)=X.*V7;
Y(:,46)=X.*V8;
Y(:,47)=X.*V9;
Y(:,48)=X.*V12;
X=V2.*V8;
Y(:,49)=X.*V8;
Y(:,50)=X.*V12;
Y(:,51)=X9.*V2;
X=V2.*V11;
Y(:,52)=X.*V11;
Y(:,53)=X.*V12;
X=V2.*V12;
Y(:,54)=X.*V12;
Y(:,55)=X.*V13;
Y(:,56)=X3.*V8;
Y(:,57)=X3.*V12;
Y(:,58)=X3.*V13;
X=V3.*V4;
Y(:,59)=X.*V8;
Y(:,60)=X.*V13;
Y(:,61)=V3.*V6.*V8;
X=V3.*V7;
Y(:,62)=X.*V7;
Y(:,63)=X.*V8;
Y(:,64)=X.*V9;
Y(:,65)=X.*V13;
X=V3.*V8;
Y(:,66)=X.*V8;
Y(:,67)=X.*V13;
Y(:,68)=Y(:,14).*V3;
X=V3.*V12;
Y(:,69)=X.*V12;
Y(:,70)=X.*V13;
X=V3.*V13;
Y(:,71)=X.*V13;
Y(:,72)=X.*V14;
Y(:,73)=V4.*V7.*V9;
Y(:,74)=V4.*V8.*V14;
X=V4.*V13;
Y(:,75)=X.*V13;
Y(:,76)=X.*V14;
Y(:,77)=X6.*V7;
Y(:,78)=X6.*V8;
Y(:,79)=X6.*V13;
X=V6.*V7;
Y(:,80)=X.*V7;
Y(:,81)=X.*V8;
Y(:,82)=X.*V12;
Y(:,83)=X.*V16;
Y(:,84)=X.*V17;
X=V6.*V8;
Y(:,85)=X.*V8;
Y(:,86)=X.*V11;
Y(:,87)=X.*V12;
Y(:,88)=X.*V13;
X=V6.*V11;
Y(:,89)=X.*V12;
Y(:,90)=X.*V13;
X=V6.*V12;
Y(:,91)=X.*V12;
Y(:,92)=X.*V16;
Y(:,93)=X13.*V6;
Y(:,94)=X7.*V8;
Y(:,95)=X7.*V9;
Y(:,96)=X7.*V11;
Y(:,97)=X7.*V12;
Y(:,98)=X7.*V14;
Y(:,99)=X7.*V16;
Y(:,100)=X7.*V17;
Y(:,101)=Y(:,1).*V8;
Y(:,102)=Y(:,1).*V9;
Y(:,103)=Y(:,1).*V17;
Y(:,104)=Y(:,1).*V18;