# Research Group “Memory-hierarchy optimal matrix multiplication-programs”

### Open positions

### Short Project Description

In many applications that work on huge data sets, from areas like bioinformatics, data-mining, network analysis, optimization, and simulation, the computation time is a major concern. To shorten this time, we have to take into account the many aspects that determine the running time of a program on a modern computer. One of these aspects is the range of different type of memory, from fast but small to huge but slow. More precisely, there are registers on the CPU, various kinds of caches, main memory and disk, the levels of the memory hierarchy. The data transfer between these levels happens by means of so called I/O-operations that can be very slow, and should be minimized to achieve fast programs.

The core computation of many applications that require huge amounts of data can be modeled by sparse matrices, such that general purpose high performance software libraries for abstract operations can be used. One of the basic such operations is the multiplication of a huge sparse matrix with a vector. It has been recognized that this is one of the operations where modern computers operate significantly below their peak CPU-performance, indicating that indeed the data transfer in the memory hierarchy is the bottleneck.

In very recent work, we showed a lower bound on the number of data transfers that are necessary to compute the product of an arbitrary sparse matrix with a vector, and a sorting-based algorithm that is asymptotically optimal. Fortunately, in many applications the sparse matrices do have a structure that can be exploited to perform the multiplication with fewer I/O's.

The focus of this project is to (automatically) analyze the structure
of a huge sparse matrix with respect to the amount of data transfer
that is required to multiply with this matrix.
In other words, for a given sparse matrix `A` we are interested in a
program that transforms a vector `x` into the
product `Ax` with the fewest possible I/O-operations.