## Introduction

Long time ago I parallelized a program written in C++ (Well, not really C++: the only C++ thing in it was the usage of cout in stead of printf). As usual, I looked first at possibilities to optimize the serial version of the program.

Profiling learnt me that the program spent most of it's time in computing addresses in a 6-dimensional array using a hand-written alogorithm, so it seemed that using a normal C syntax for that array, like

```
A[i][j][k][l][m][n] = 10
```

based on pointers, would speed-up the program considerably.

I learned soon that hand-writing code to create pointers, pointers to pointers, pointers to pointers to pointers and so on, is not easy to do correctly. (Indeed, in another program I found a mistake in hand-written code to create a 2-dimensional array). So I decided to create a function that does the job. Originally the function I created was named 'matalloc', but since there are very many matalloc's around on the web, I changed the name in wv_matalloc.

BTW: indeed the program ran several factors faster with my modification.

## Efficiency

wv_matalloc is based on the usage of pointers to pointers to ... Sometimes I get the impression that real programmers do not like this for performance reasons.

Suppose, we have a matrix A with dimensions M and N. Using wv_matalloc we access element i,j with:

```
int M=100; int N=50;
double **A = wv_matalloc(sizeof(double),0,2,M,N)
int i,j;
for (i=0; i<M; i++)
for (j=0; j<N; j++)
A[i][j] = i*j;
```

Real programmers do it as follows:

```
int M=100; int N=50;
double *A = malloc(sizeof(double)*M*N);
int i,j;
for (i=0; i<M; i++)
for (j=0; j<N; j++)
A[i*N+j] = i*j;
```

It seems to me that in the 6-dimensional case it requires a REAL programmer to do this reliably.

In the tar.gz file under downloads, a program (matmulcompare.c) is included to measure the difference in performance using a naive implementation of a matrix multiplication. The program source contains some explanation, results and compile instructions.

In short: for gcc both methods have the same efficiency. For icc (the Intel compiler) the real-programmers approach is much more efficient, but then, for matrix multiplication you should use something from Lapack or so, and you can still use wv_matalloc to access your data in a convenient way.

## Example using wv_matalloc and Lapack

The tar.gz file under downloads contains an example (dgesvtest.c) how to use wv_matalloc and Lapack to solve a system of linear equations.