how multiply array [A][B] in Cuda?

hay i have been works my cuda program abaout fuzzy c means
and i confuse how to implement my serial code to paralel code in CUDA because my function using two dimension array

in my function there is multiply array like pow[(A][B], variabel) and matrix multiply like this t[i][j] * data_point[i][k].
How i can implement this in simple cuda,
Thanks before

function 1

int calculate_centre_vectors() {
    //printf("tes\n");

    int i, j, k, l, m;
    double numerator, denominator;

//printf("loop degree_of_memb\n");

    for (i = 0; i < num_data_points; i++) {
        for (j = 0; j < num_clusters; j++) {
            t[i][j] = pow(degree_of_memb[i][j], fuzziness);
        }
    }
    //printf("loop cluster_centre\n");
    for (j = 0; j < num_clusters; j++) {
        for (k = 0; k < num_dimensions; k++) {
            numerator = 0.0;
            denominator = 0.0;
            for (i = 0; i < num_data_points; i++) {
                numerator += t[i][j] * data_point[i][k];
                denominator += t[i][j];
            }
            cluster_centre[j][k] = numerator / denominator;
        }
    }
}

function 2

double update_degree_of_membership() {
    int i, j;
    double new_uij;
    double max_diff = 0.0, diff;
    for (j = 0; j < num_clusters; j++) {
        for (i = 0; i < num_data_points; i++) {
            new_uij = get_new_value(i, j);
            diff = new_uij - degree_of_memb[i][j];
            if (diff > max_diff)
                max_diff = diff;
            degree_of_memb[i][j] = new_uij;
        }
    }
    return max_diff;
}

matrix multiplication examples abound

creating 2d device arrays is also sufficiently documented

the concept of 2d arrays essentially as flat, padded 1d arrays should not be too difficult to grasp

hence, you should be fine