Do you have 3104 linear systems (i.e. 3104 A matrices and 3104 b vectors), or just 3104 right hand sides (i.e. 3104 b vectors)?
If you have 3104 linear systems, I would move them all at once into big 1D arrays on the device then do a for loop and pass the starting pointers for each matrix and vector to the routine. This could be slow due to overhead.
If you have 3104 b vectors, then you can batch solve. If it requires a sparse matrix, you can just convert A to sparse first or use a GPU routine. (dense2csr)