The Complexity of Matrix Multiplication
📅Tue Oct 13 2020
Matrix Multiplication is something which you may never really have considered too much, it is how we calculate the composition of 2 linear transformations. The algorithm is pretty simple and doesn't seem inefficient in any real way. However there is quite a bit more than meets the eye to this.
Definition
We will be considering the following linear maps
We want to find , in particular we actually want to compute the matrix that represents from the matrices .
Starting point
The method for matrix multiplication we all know is as follows.
To compute each entry of , we perform multiplications and additions. We must do this for all entries of . So we have multiplications and additions.
This leads to an algorithm with computational complexity of for multiplication of 2 matrices.
There doesn't seem to really be any redundant calculations, but actually improvements on this exist.
Stratten's Algorithm
Strattens Algorithm was the first sub-cubic complexity algorithm for matrix multiplication. We can write matrix multiplication as sub-matrix multiplications and matrix additions. This algorithm is also , we can quite easily prove this.
Strattens Algorithm works by instead of doing sub matrix multiplications it does , this is in exchange for more additions.
The matrix multiplications are defined as follows.
The we have
We can verify that this holds by matrix algebra is quite easily, and them find that the complexity of the algorithm is .
This is ... deeply unsatisifiying, atleast to me. The seemingly arbitrary matrix products don't really seem to give any insight as to what is occurring and why it is better than the naive approach.
A linear transformation is completely defined by its actions on a basis, in this case the basis is of size . Computing each of the transformations on the basis seperately takes operations. So we are somehow taking advantage of some inherent structure in matrix multiplication to be able to calculate this using a sub-cubic algorithm.
We wish to determinine the smallest such that matrix multiplication is
But what structure? Is there similarly sequence of sub-matrix multiplications that we can use to create an even better matrix multiplication algorithm?
Tensors and Multi-linear Maps
Matrices represent linear maps from one vector space to another. Matrix Multiplication takes 2 linear maps, and maps this to another one, lets call this mapping .
More formally.
Each of these linear maps are themselves elements of a vector space.
We have . This means that our mapping is bilinear in its 2 inputs.
Note: From here on out we will pretty much be be working with the case of multiplying 2 n*n matrices over the same field. In the spirit of this we can instead write our Tensor as if all dimensions are equal or , where is our field and are the dimensions we are working over, we will call this space
Tensors
A bit of an introduction to tensors for anyone who hasn't seen them before.
We define a tensor of over vector spaces over a common field as follows
Note: more restrictive versions of this definition exist where the vector spaces used are restricted to multiple copies of a vector space and its dual
is a multi-linear map, it is linear in each of its inputs.
By fixing a basis for we get a multidimensional array of dimensions , with the entries of this array being elements of .
.
where are the corresponding basis for the vector spaces.
Using this form we can rewrite the matrix multiplication tensor as
We have now shown that we can talk about matrix multiplication as a multi-linear map (bi-linear) and in turn a tensor, which is cool I guess but seems like quite a tangent.
A theorem approaches
The tensor representation seems a bit of of left field, however by using this we can bring some powerful maths tools to bear. But before we can do this, we unfortunately need more definitions.
First we will need to define what the rank of a tensor is.
It is the minimum number of rank 1 tensors required to represent our tensor.
The border rank is then minimum rank of a sequence of tensors of which the limit is is the tensor in question.
Clearly we have , as the sequence which is just the tensor with its rank decompisition repeated gives a feasible border rank of
We also have the simple result that . We get this by creating a rank 1 tensor corresponding to the each of the basis vector of each constituent vector space.
Restrictions
Rank and border rank of tensors are unfortunately not (yet) as well understood as the rank of matrices. However even without complete results, we can make some progress.
Useful facts
- The number of elementary multiplications required for a bilinear map (which matrix multiplication is), is tightly bounded by the rank of the corresponding tensor.
- The rank of a tensor product is sub multiplicative.
The first of these is very useful, it shows that attempting to bound the rank of the matrix multiplication tensor is pretty much exactly what we want to do to determine .
The sub-multiplicative result implies that if we can get a good bound on the rank of any for a specific we can use this to bound .
We can relate the a specific property of this tensor to our search for efficent ways of multiplying 2 matrices.
Strassans Theorem
Where is the border rank of the tensor.
It has been shown that , this fact and the sub-multiplicity of tensor rank allows us to bound . This is the same bound we get by using Stattens Algorithm!
This is actually what Stratten's Algorithm is doing. We are decomposing the tensor into rank 1 tensors. The seemingly random matrixes represent this decomposition!