I’m finding that whenever I use compute shaders that operate on data structures larger than vec4 I am having huge peformance issues. Doing the following to re-order a buffer after is sort is around 800 times slower than re-ordering vec4s. I did the timing with OpenGL Timer queries.
I wonder if anybody else is using compute shaders with larger data structures and not having any issues. Or I’m doing something fundamentally wrong and/or I should expect this behaviour. Or there could be a driver issue, I am using the latest GameReady driver which is 388.00 with a GTX 1060 6gb.
I’ve tried packing with only vec4s but not to any success, a slightly larger struct causes slightly slower execution (1.9ms-2.0ms). Writing the buffer only without re-ordering, or writing in the elements separately causes no change in execution speed. The only thing that does is the size of the struct written.
Here’s an example of a very slow shader.
#version 430
precision mediump float;
struct ConvexHull{
vec3 position;
uint enabled;
vec3 half_ex;
uint hash;
vec4 verts_0[8];
vec4 planes_n[6];
vec4 planes_d[6];
};
layout(local_size_x = 128) in;
layout(binding = 0, std430) readonly buffer In {
ConvexHull in[];
};
layout(binding = 1, std430) writeonly buffer Out {
ConvexHull out[];
};
layout(binding = 2, std430) readonly buffer SortData {
uvec4 sort_buf[];
};
void main() {
uint index = gl_GlobalInvocationID.x;
out[index] = in[sort_buf[index].y];
}