Long time ago I parallelized a program written in C++ (Well, not really C++: the only C++ thing in it was the usage of cout in stead of printf). As usual, I looked first at possibilities to optimize the serial version of the program.
Profiling learnt me that the progam spent most of it's time in computing addresses in a 6-dimensional array using a hand-written alogorithm, so it seemed that using a normal C syntax for that array, like
A[i][j][k][l][m][n] = 10
based on pointers, would speed-up the program considerably.
I learned soon that hand-writing code to create pointers, pointers to pointers, pointers to pointers to pointers and so on, is not easy to do correctly. (Indeed, in another program I found a mistake in hand-written code to create a 2-dimensional array). So I decided to create a function that does the job. Originally the function I created was named 'matalloc', but since there are very many matalloc's around on the web, I changed the name in wv_matalloc.
BTW: indeed the program ran several factors faster with my modification.